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(57) The present invention provides DNA molecules 
that constitute fragments, of the genome of a plant, and 
polypeptides encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3' termination sequence, and ere also useful in 
controlling the behavior of a gene in the chromosome, 



in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
kited DNA fragments, or identification of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

°Arabidopsis. DMA is used in the present experi- 
ment, but the procedure is a genera! one 
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Description 

FIELD OF THE INVENTION 

s [0001 ] The- present invention relates to isolated polynucleotides that represent a complete gene, or a fragment there- 
of, that is expressed. In addition, the present invention relates to the polypeptide or protein corresponding to the coding 
sequence of these polynucleotides The present invention also relates to isolated polynucleotides that represent reg- 
ulatory regions of genes. The present invention also relates to isolated polynucleotides that represent untranslated 
regions of genes. The present invention further relates to the use of these isolated polynucleotides and polypeptides 

to and proteins. 

DESCRIPTION OF THE RELATED ART 

[0002] Efforts to map and sequence the genome of a number of organisms are in progress; a few complete genome 
»5 sequences, for example those of B. colt and Saccharornyces ceievisiae are known t Blattner et ai., Science 277: 1453 
(19971; Goffeau et ai., Science 274:546 (1996)). The complete genome of a multicellular organism. C. elegans. has 
also been sequenced {See, the C. eleganz Sequencing Consortium, Science 282:2012 (1998)} To date, no complete 
genome of a plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 

20 SUMMARY OF THE INVENTION 

[0003] The present invention comprises polynucleotides, such as complete cDNA sequences and/or sequences of 
genomic DNA. encompassing complete genes, fragments of genes, and/or regulatory elements of genes and/or regions 
with other functions and/or intergenic regions, hereinafter collectively referred to as Sequence-Determined DNA Frag- 

25 ments (SDFs}, from different plant species, particularly corn, wheat, soybean, rice and Arabidopsis thaiiana. and other 
plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or proteins derived therefrom, in 
some instances, Ihe SDFs span the entirely of a protein-coding segment, in some instances, the entirely of an mRNA 
is represented. Other objects of the invention that are also represented by SDFs of the invention are control sequences, 
such as. but not limited to, promoters. Complements of any sequence of the invention are also considered part of the 

30 invention. 

[0004] Other objects of the invention are polynucleotides comprising eyon sequences, polynucleotides comprising 
intron sequences, polynucleotides comprising introns together with exons, intron/exon junction sequences, 5' untrans- 
lated sequences, and 3' untranslated sequences of the SDFs of the present invention. Polynucleotides representing 
the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any 
35 desirable amino acid sequence are within the scope of the invention. 

[0005] The present invention also resides in probes useful for isolating and identifying nucleic acids that hybridirre 
to an SDF of the invention The probes can be of any length, but more typically are 12-2000 nucleotides in length: 
more typically, 15 to 200 nucleotides long; even more typically, 18 to 100 nucleotides long. 

[0006] Vet another object of the invention is a method of isolating and/or identifying nucleic acids using the following 
40 steps: 

fa) contacting a probe of the instant invention with a polynucleotide sample under conditions that permit hybridi- 
zation and formation of a polynucleotide duplex; and 
(b) detecting and/or isolating the duplex of step (a). 

45 

[0007] The conditions for hybridization can be from low to moderate to high stringency conditions The sample can 
include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the invention are useful, 
for example, without limitation, for mapping of genetic traits and/or for positional cloning of a desired fragment of ge- 
nomic DNA. 

so [0008] Probes and methods of the invention can also be used for detecting alternatively spliced messages within a 
species. Probes and methods of the invention can further be used to detect or isolate related genes in other plant 
species using genomic DNA fgDMA) and/or cDNA libraries. In some instances, especially when longer probes and low 
to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDNA and/or gDNA 
sequences of a plant. This approach is useful for isolating representatives of gene families which are identifiable by 

55 possession of a common functional domain in the gene product or which have common cis-acting regulatory sequences. 
This approach is aiso useful for identifying orthologous genes from other organisms. 

[0009] The present invention also resides in constructs for modulating the expression of the genes comprised of all 
or a fragment of an SDF The constructs comprise all or a fragment of the expressed SDF, or of a complementary 
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sequence Examples of constructs include ribozymes comprising RNA encoded by an SDF or by a sequence comple- 
mentary thereto, antisense constructs., constructs comprising coding regions or parts thereof, constructs, comprising 
promoters, introns, untranslated regions, scaffold attachment regions, methylating regions. enhancing or reducing re- 
gions. DNA and chromatin conformation modifying sequences, etc Such constructs can be constructed using viral. 

s plasmid. bacterial artificial chromosomes (BACst. plasmid artificial chromosomes (FACsV autonomous piant plasmids. 
plant artificial chromosomes or other types of vectors and exist in the plant as autonomous replicating sequences or 
as DNA integrated into the genome When inserted into a host ceil the construct is, preferably, functionally integrated 
with, or operatively linked to, a heterologous polynucleotide. For instance, a coding region from an SDF might be 
operabiy linked to a promoter that is functional in a plant. 

10 [0010] The present invention also resides in host cells, including bacterial or yeast cells or plant cells, and plants 
that harbor constructs such as described above Another aspect of the invention relates to methods for modulating 
expression of specific genes in plants by expression of the coding sequence of the constructs, by regulation of expres- 
sion of one or more endogenous genes in a plant or by suppression of expression of the polynucleotides of the invention 
in a plant. Methods of modulation of gene expression include without limitation (1) inserting into a host cell additional 

»5 copies of a polynucleotide comprising a coding sequence; (2) modulating an endogenous promoter in a host cell; (3) 
inserting antisense or nbozyme constructs into a host cell and (41 inserting into a host cell a polynucleotide comprising 
a sequence encoding a variant, fragment, or fusion of the native polypeptides of the instant invention. 

BRIEF DESCRIPTION OF THE TABLES 

[0011] The sequences of eyemplary SDFs and polypeptides corresponding to the coding sequences of the instant 
invention are described in Reference Tables 1 and 2, REF Tables 1 and 2". and in Sequence Tables 1 and 2. SEQ 
Tables 1 and 2." The REF Tables refer to a number of Maximum Length Sequences" or MLS." Each MLS corresponds 
to the longest cDNA obtained, either by cloning or by the prediction from genomic sequence. The sequence of the 
25 MLS is the cDNA sequence as described in the Av subsection of the REF Tables. 
[0012] The RO-F 'fable includes the following information relating to each MLS 

!. cDNA Sequence 

A. 5' UTR 

B. Coding Sequence 

C. 3' UTR 

Genomic Sequence 

A. Exons 
8. Introns 
0. Promoters 

40 111. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start: Sites 

V. Polypeptide Sequences 

A. Signal Peptide 
45 B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotide Sequences 
50 I. cDNA SEQUENCE 

[0013] The REF Tables indicate which sequence in the SEQ Tables represents the sequence of each MLS The MLS 
sequence can comprise 5' and 3' UTR as well as coding sequences. In addition, specific cDNA clone numbers also 
are included in the REF Tables when the MLS sequence relates to a specific cDNA clone 

ss 

A. 5' UTR 



[0014] The location of the- 5' UTR can be determined by comparing the most 5' MLS sequence with the corresponding 
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genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at any of the transcriptional 
start sites and ending at the last nucleotide before any of the transiationai start sites corresponds to the 5' UTR, 

B. Coding Region 

s 

[0015] The coding region is the sequence in any open reading frame found in the MLS Coding regions of interest 
are indicated in the Poiy P SEQ subsection of the REF Tables,. 

C 3' UTR 

10 

[0016] The location of the: 3' UTR can be: determined by comparing the most 3' MLS sequence with the; corresponding 
genomic sequence as indicated m the REF Tables. The sequence that matches, beginning at the transiationai stop 
site and ending at the last nucleotide ot the MLS corresponds to the 3' UTR 

i5 H. GENOMIC SEQUENCE 

[0017] Further, the REF Tables, indicate the specific gi" number of the genomic sequence if the sequence resides in 
a public databank For each genomic sequence, the REF Tables indicate which regions are included in the MLS. These 
regions can include the 5' and 3' UTRs as well as the coding sequence of the MLS See. for example, the scheme below. 



UTF. i Excsr, 



[0018] The REF Tables report the first and last base of each region that are included in an MLS sequence. An example 
is shown below: 

gi No. 47000; 
35 37102... 3749? 

37593,.. 37925 

The numbers indicate that the MLS contains the following sequences from two regions of gi No. 47000; a first region 
including bases 37102-37497, and a second region including bases 37593-37925. 

40 A. EXON SEQUENCES 

[0019] The location of the e*ons can be determined by comparing the sequence of the regions from the genomic 
sequences with the corresponding MLS sequence as indicated by the REF Tables 

45 »:.^iIlALEXON 

[0020] To determine the location of the initial exon. information from the 

(1) polypeptide sequence section: 
so (2) cDtMA polynucleotide section: and 

i'3) the genomic sequence section 

of the REF Tables are used. First, the polypeptide section will indicate where the transiationai start site is located in 
the MLS sequence The MLS sequence can be matched to the genomic sequence that corresponds to the MLS. Based 
55 on the match between the MLS and corresponding genomic sequences , the location of the transiationai start site can 
be determined in one of the regions of the genomic sequence. The location of this transiationai start site is the start of 
the first exon. 

[0021] Generally, the last base of the exon of the corresponding genomic region, in which the transiationai start site 
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was located, wit! represent the end of the initial exon. In some cases, the initial exon will end with a stop codon, when 
the initial exon is the only exon. 

[0022] In the- case whan sequences representing the MLS are in ihe positive strand of the cof responding genomic 
sequence, the last base will be a latger rtumbei than the first base When the sequences representing the MLS are in 
s the negative strand of the- corresponding genomic sequence, then the last base will be a smaller number than the first 
base. 

ft. INTERNAL EXONS 

to [0023] Excepi for the regions that comprise- the 5' and 3' UTRs, initial exon and terminal e> on the remaining genomic 
regions that match the MLS sequence are the internal exons Specifically the bosses defining the boundaries of the 
remaining regions also define the mtion/e^on [unctions of the internal exons 

[0024] As with the initial exon, the location of the terminal exon is determined with information from the 

(1 ) polypeptide sequence section; 
(2 1 cDNA polynucleotide section: and 
so (3 1 the genomic sequence section 

of the REF Tables The polypeptide section will indicate whete the stop codon is located in the MLS sequence The 
MLS sequence can be matched to the corresponding genomic sequence Based on the match between MLS and 
corresponding genomic sequences, the iocjtion of the stop codon can be determined in one of the regions of the 
25 genomic sequence. The location of this stop codon is the end of the terminal exon. Generally, the first base of the exon 
of the corresponding genomic region that matches the cDNA sequence, in which the stop codon was located, will 
represeni the beginning of the terminal exon In some cases, the translations! start srie will represent the starl ot the 
terminal exon, which will be the only exon. 

[0025] In the case when the MLS sequences are in the positive strand of the cortespondmg genomic sequence, the 
30 last base vviii be a larger number than ihe first base When the MLS sequences are in the negative strand of the 
coi responding genomic sequence, then the last base will be a smaller number than the fust base 

B. fsMTRON SEQUENCES 

35 [0026] In jddittcn, the mtrons conespondmg to the MLS are defined by identifying the genomic sequence located 
between the regions where the genomic sequence comprises evons. Thus, nitrons are defined as starting one base 
downstream of a genomic region comprising an exon and end one base upstream from a genomic legion comprising 
an exon. 

40 C. PROMOTER SEQUENCES 

[0027] As indicated below, promoter sequences corresponding to the MLS are defined as sequences upstream of 
the first exon, more usually, as sequences upstream of the first of multiple transcription start sites: even more usually 
as sequences about 2,000 nucleotides upstream of the fust of multiple transcription start sites 

45 

lit. LINK of cDNA SEQUENCES to CLONE IDs 

[0028] As noted above, the REF tables identify the cDNA clone(s) that relate to each MLS The MLS sequence can 
be longer than the sequences included in the cDNA clones In such a case the REF table indicates the region of the 
so MLS that is included in the clone. If either the 5' or 3' termini of the cOMA clone sequence is the same as the MLS 
sequence, no mention will be made 

tV. Multiple Transcription Start Sites 

55 [0029] Initution of transcription can occur at a number of sites of the gene The REF tables indicate the possible 
multiple transection sites for each gene In the REF tables the location of the ttanscrtption start sites can be eithet 
a positive or negative number. The positions indicated by positive numbers reter to the transcription start sites as 
located in the MLS sequence The negative numbers indicate ihe transcnplion start site within the genomic sequence 
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that corresponds to the MLS. 

[0030] To determine the location of the transcription start sites with the negative numbers, the MLS sequence is 
aligned with the corresponding genomic sequence in the instances when a public genomic sequence is referenced, 
the relevant corresponding genomic sequence can be found by direct reference to the nucleotide sequence indicated 
s by the gi" number shown in the public genomic DNA section of the REF tables. When the position is a negative number, 
the transcription start site is. located in the corresponding genomic sequence upstream of the base that matches the 
beginning of the MLS sequence in the alignment. The negative number is relative to the first base of the MLS sequence 
which matches the genomic sequence corresponding to the relevant gi" number. 

[0031] In the instances when no public genomic DNA is referenced, the relevant nucleotide sequence for alignment 
to is the nucleotide sequence associated with the amino acid sequence designated by gi" numbei of the later Polyp SEQ 
subsection. 

*5 [00323 The PoiyP SEQ subsection lists SEQ ID NOs and Ceres SEQ ID NO for polypeptide sequences corresponding 
to the coding sequence of the MLS sequence and the location of the translations! start site with the coding sequence 
of the MLS sequence. 

[0033] The MLS sequence can have multiple translational start sites and can be capable of producing more than 
one polypeptide sequence. 

A. Signal Peptide 

[0034] The REF Tables also indicate in subjection (Bi the deavage site of th*> putative signal peptide of the polypep- 
tide cof respond! rig to the coding sequence of the MLS sequence Typically, signal peptide coding sequences comprise 
25 a sequence encoding the first resioue of the polypeptide to the cleavage site resioue 

B. Domains 

[0035] Subsection (C'j provides information regarding identified domains (where present) within the polypeptide and 
30 (where present) a name for the polypeptide domain. 

[0036] Subsection (Dp) provides (where present) information concerning amino acid sequences that are found to be 
35 related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi" number 

40 [0037] Subsection (On) provides polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence identity to the MLS or conesponding genomic sequence 



Abbreviation 


Description 


Max Len. Seq 


Maximum Length Sequence 


re! to 


Related to 


Clone Ids 


Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq, in cDNA 


Genomic Sequence in cDNA (Each region for a single gene prediction is 
listed on a separate line. 




in the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 


(Ac) cDNA SEQ 


cDNA sequence 



(continued) 



Abbreviation 


Description 


- Pat. Appirt SEQ iD NO 


Patent Application SEQ ID NO: 


-Geres SEQ 10 NO: 1673877 


Ceres SEQ ID NO: 


-SEQ#w. TSS 


Location within the cDNA sequence, SEQ iD NO;, of Transcription Start Sites 
which are listed below 


- Clone !D #; # -> # 


Clone !D comprises bases # to # of the cDNA Sequence 


PolyP SEQ 


Polypeptide Sequence 




Patent Application SEQ ID NO 


- Ceres SEQ !D NO 


Ceres SEQ ID NO: 


- Loc SEQ ID NO @ nt 


Location of translations! start site in cDNA of SEQ ID NO: at nucleotide 
number 


i'Cj Pred. PP Nom. & An not 


Nomination and Annotation of Domains within Predicted Polypeptide^) 


- (Title) 


Name of Domain 


■ Loo SE-:QtDNO#-#-->#sa. 


Location of the domain within the polypeptide of SEQ ID NO: from # to # 
amino acid residues. 


(Dpi ReL .AA SEQ 


Related Amino Acid Sequences 


- Align, NO 


Alignment number 


- gi No 


Gi number 


- Desp. 


Description 


- % Idnt. 


Percent identity 


- Align. Len. 


Alignment Length 


-Loc. SEQ IDNO:#->#aa 


Location within SEQ ID NO: from # to # amino acid residue. 



[0038] The invention relates to (ft polynucleotides and methods of use thereof, such a 



I A Probes. Primers and Substrates, 
IB. Methods of Detection and Isolation; 



B.1. Hybridization; 

B.2. Methods of Mapping; 

B.3. Southern Blotting: 

B.4. Isolating cONAfrom Related Organisms; 
B.5. Isolating and/or Identifying Orthologous Genes 



IC Methods of Inhibiting Gene E>pression 



C.1. Antisense 
50 C.2 Ribozyme Constructs: 

C.3. Chimeraplasts; 
C A Co-Suppression, 
C.5 Transcriptional Silencing 
C 6 Other Methods to Inhibit Gene Expression 

ss 

ID, Methods of Functional Analysis; 
IE Promoter Sequences and Their Use; 
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IF. UTRs and/or intron Sequences and Their Use; and 
!G. Coding Sequences and Their Use. 

[0039] The invention also refates to (if) polypeptides and proteins and methods of use thereof, such as 

HA. Native Polypeptides and Proteins 

A.1 Antibodies 

A. 2 in Vitro Applications 

!!B. Polypeptide Variants, Fragments and Fusions 

B 1 Variants 

B. 2 Fragments 
B.3 Fusions 

[0040] The invention also includes (ill) methods of modulating polypeptide production, such as 

I HA, Suppression 

A.1 Antisense 
A..2 Ribozymes 
A. 3 Co-suppression 

A. 4 Insertion of Sequences into the Gene to be Modulated 

A. 5 Promoter Modulation 

A 6 Expression of Genes containing Dominant-Negative Mutations 
1MB. Enhanced Expression 

8.1 Insertion of an Exogenous Gene 

B. 2 Promoter Modulation 

[0041] The invention further concerns (IV) gene constructs and vector construction, such as 

IVA. Coding Sequences 
!VB. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
V Transformation Techniques 

Definitions 

[0043] Allelic variant An allelic variant" is an alternative form of the same SDF, which resides at the same chro - 
mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
regions Allelic variants can arise by normal genetic variation in 3 population Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a 
cultivarorecotype An allelic, variant mayor may not give rise to a phenotypic change, and may or may not be expressed 
An allele can result in a detectable change in the phenotype of the trait represented by the locus A phenotypically 
silent allele can give use to a product. 

[0044] Alternatively spliced messages Within the context of the current invention, alternatively spliced messag- 
es" refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons, 
introns and/or intron-exon junctions. 

[0045] Chimeric The term chimeric" is used to describe genes, as defined supra, or contfucts wherein at least 
two of the elements of the gene or construct, such as the promoter and the coding sequence and/or other regulatory 
sequences and'or filler sequences and/or complements thereof, are heterologous to each other 
[0046] Constitutive Promoter Promotefs feferred to herein as "constitutive promoters" actively promote transcription 
under most, but not necessarily all. environmental conditions and states of development or celi differentiation Examples 



11 



EP 1 033 405 A2 



of constitutive promoters include the cauitflowei mosaic vhus ^CaW\, ) 3SS transcnpt initiation region and the 1' or 2 
promoter denied troni TON A of lgrooactet<<jr>-> ti.pivtaae, s and other transaction initiation legions from various 
plant gents slkti as the maize ubtquitin-1 promoter kncwn to those of skill 

[0047] Coordinated Exptess^ Th^ teim coftdinateh e>piessed " as usei in the unrent invention lefers to 
s genes that are evpressea at the same or a similar time and or stage and/or under the same or similar environmental 
conditions. 

[0048] Domain Domains 3re flng&rprtnts or sicinatur> j s that f m hu usto to characterise protein tamilies and'or 
parts ot proteins buch fingerprints or signatures can comprise conseu'ed t i) primary sequence ',2t secondary struc- 
ture, and/or (3) three-dimensional conformation. Generally, each domain has been associated with either a family of 

to proteins ct motifs Typicall> these families and/cr motifs have b^en correlated with specific n-\iitio and ui in-vivo ac- 
tivities A donum can be irn length including thu entirety of the sequence of 3 protein Detailed descriptions of the 
domain^ associated families and motifs and conflated ^tiwties of th<= j. ot\ pej. ttdeb 1 1 the instant invention 3i*> de- 
* cubed bekw UsualK the polypeptides with design .tied dt main(s) o.tn exhibit at least one activity that is exhibited by 
any polypeptide tnat comprises the same domam(s). 

*s [0049| Endogenous The term endogenous," within the context of the current invention refers to any polynucle- 
otide polypeptide ot protein sequence, which i*, a natuial part ot a cell or organisms regenerated ftom said cell 
[0050] (Oogenous E-! «.ogenous ' as. tefened to withm is any pt lynucleotide polypeptide oi pn. t<rin sequence 
u\nethet chimeric oi not that is initially or subsequently mtroouced into the genome of an individual nost cell oi the 
onanism regeneiated from said host cell b\ an\ means othet than by a sexual cioss Samples of means by which 

so this (.an be- : iocomf. listed an* desonbed telow and iruiude AgivbJ< ierw, (-mediated fr : msfcn nation (of diocts - <■■■ g 
Salmon et al EVBQ J 5 141 ^1984) Heneia-EstrHIo et ..il EMBO J 2 987 1 108*1) of monocots tepiesentative 
papers are those by Escudero et al , Ptertf J 10 355 t1996) Ishida et al . Nature Biotechnology 14 745 (1956). May 
et a I Bto/Tachnology 1 3 486 ( 1 995)} biolistic methods < Armaleo et a I Cut rent Genetics 1 7 97 1 990 n. electropo ration 
in plants techniques, and the like Such a plant containing the exogenous nucleic acid is referred to here as a T 0 for 

25 the primary transgenic plant and for the first generation. The term exogenous" as used herein is also intended to 
encompass inserting a naturally found element into a no n-natu rally found location. 

[0051] Filler sequence &s used herein, filler sequence" refeis lo any nucleotide sequence that is inserted into 
DIM A construct to evoke a particular spacing between particular components such as a promoter and a coding region 
and may provide an additional attnbute such as a restriction enzyme site 

30 [0052] Gene The term gene," as used in ihe content oi ihe ouireni invention encompasses all regulatory and coding 
sequence contiguously associated with a single hereditary unit with a genetic function (see SCHEMATIC 11 Genes 
can include non-coding sequences that modulate the genetic function that include, but are not limited to, those that 
specify polyadenylation transcriptional regulation. ON A conformation, chtomatm confotmation, extent and position of 
base methvlation and binding sites of proteins that control all of these Genes compnsed of exons" (coding sequences), 

35 which may be interrupted by nitrons" (non-coding sequences') encode proteins A gene's genetic function may require 
only RNA expression or protein production, or may only require binding of proteins anchor nucleic acids without asso- 
ciated expression In certain cases genes adjacent to one another may share sequence in such a way i hat one gene 
*ill overlap the other A gene can be found within the genome of an otganism, artificial chtomosome, plasmid, vector, 
etc., or as a separate isolated entity. 

■to [0053] Gene Family Gene family" is used m the cunent invention to desenbe a group of functionally related genes, 
each of which encodes a separate protein. 

[0054] Heterologous sequences Heterologous sequences" are those that are not operatively linked or are not 
contiguous to each other in nature. For example, a promoter from corn is considered heterologous to an Arabidopsts 
coding tegion sequence Also a promoter from a gene encoding a growth factoi fiom corn is considered heterologous 

■*s to a sequence encoding the corn receptor for the growth fjctor Regulatory element sequences, such as UTRs or 3' 
end termination sequences that do not originate in nature from the same gene as the coding sequence originates from, 
are considered heterologous to said coding sequence Elements operatively linked in nature and contiguous to each 
other are not heterologous to each other On the other hand, these same elements remain ope lativley linked but become 
heterologous if other filler sequence is placed between them Thus, the promoter and coding sequences of a corn gene 

50 expressing an ammo acid ttansportet ate not heterologous to each other, but the promoter and coding sequence of a 
coin gene operatively linked in a novel manner ate heterologous. 

[0055] Homologous gene In the cut rent invention, homologous gene" refers to a gene that shares sequence 
similarity with the gene of interest. This similarity may be in only a fragment of the sequence and often represents a 
functional domain such as, examples including without limitation a DNA binding domain, a domain with tyrosine kinase 
55 activity or the like The functional acliuties of homologous genes are not necessarily the same 

[0056] Inducible Promoter An inducible promote!" in the contevt of the current invention refers to a promoter 
which is regulated under certain conditions, such as light, chemical concentration, protein concentration, conditions in 
an organism, cell, or organelle, etc. Atypical example of an inducible promoter, which can be utilized with the polynu- 
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cleotides of the present indention is PARSKi the promoter nom the Araniaopsrs gene encoding a senne-thteonine 
Kinase <;n^ym<; and *htth piomttw b indued L> dehydration abscissic ?cid =ind sodium chloride (Vteng and Gtod- 
man Piant J 8 37 (1995}) Evamf les of iWiMt on merit : il condition 1 , that may afltct transcnpfion by mduubie pioiiKkts 
include anaetobic conditions e!w<jtei tempeiakue orthe ptesence of light 
s> [0057] Intcrgcmc region intergcnic region as used in the- current invention refers to nucleotide sectuence oc 
curiing in the genome that sepaiates .jdia>.ent genes 

[0058] Mutant oiin- 1 In the current invention mutant' tc- fc- ts to a he- titablc- change in DNA =;s;qns;nec- at a ^p^cift^ 
location. Mutants of the current invention may or may not have an associated identifiable function when the mutant 
gene is transcribed. 

10 [0059] Orthologous Gc-ne in the tunent invention c rthoKigout. g^nt;" refers k a ^eond gc-ne that encodes a 
cmn- 1 product th;<t performs a similar function as the ptodutt of a first gent Tht orthologous gene nuv ?<iso h;u'f ;< 
degree ots^qtienct simiianty to thMiibt g^nn Th*> orthologous gent ina\ entode a polypeptide that ethtbrts a degiee 
of sequence simiianty to a pt lypeptide conei.pt ndinj to a fir* t gene The sentience similanty o.tn be founo within a 
functional 00m .11 n or along the entire length of the coding sequent e of the genes ana* or then cot res pom ing pohpep- 

*5 tides. 

[0060] FVrcent-ige ot sequence identity "Peicentage of sequence identity' as used herein is determined bv 
companng twt tptimall\ aligned sequences ovet .j comparison window wheie the fragment of the pcKnuoleotioe 01 
ammo acid sequence in the comparison window may comprise additions ot deletions te g gacs ot overhangs^ as 
compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two 

J- sequences The peieentagc- is >'akul : ikd by d« trimming the nuinter ot positions at which the identical nuekic acid 
base or amino acid tesnue oc:uts in both sequences to yield the numoer of matched positions abiding the numbei 
of matched positions by the total numbet ot positions in tne v>indo\» of comparison ana multiplying the result by 100 
k yield the peicentage of sequence identity Optimal alignment of sequent es foi tompanson m=ii be conducted b\ 
th> j local homology algorithm of Smith ?<nd Waterman Ado APL Math 2 4b2 tlSftH bv the- homology ;<iignmt;nt t<\- 

25 gonthm of Needieman and Wunsch J. Mot Bio!. 48:443 (1970). by the search for simiianty method of Pearson and 
t.ipman Pioc Nati AcaJ So, 1USA1 &r 2444 (1988 1 by oomputen?ed implementations of these algorithms (CAP 
BE-?TF!T BLAST PASTA, and TFA-?TA in ihe U'istonsin Genetics, Software Package Genetics, Computer Gioup 
iG<~Gi 5~5 Scii*nc u Dr Madron Wl) ot by mspet tion Given that tv\o s^quf nc> j s h?<vt; be^n id^ntiti^d for companion 
GAP and BESTFIT ate piefeiabiy employed to detetrnine their optimal alignment Typically the default values of 5 00 

JO k t gap Atiqht and 0 ->0 for g=ip weight lenqth aie use d f he term' subs taniul s^qu^ncc- identify" between pc lynucteotidt; 
or polypeptide sequent es lefers to polynucleotide 01 polypeptide comr. using a sequence thut has at least H0% se- 
quence identity, preferably at least 85%. more preferably at least 90% and most preferably at least 95%. even more 
pieteiably at feast "(3% u ~% U S% 01 "9°* sequent e identity complied tt a lefeienct sequence Ubing th<= piont =tms 
[0061] Plant Piomoter A pljnt ftomoter" is .j piomoter capable of initiating tianstnptit n in plant cells and o.tn 
dnvf or facilitate transctipiion ot a fragment of th> j 3DF of the insi mt invention ui a coding sequ^ncu of the SDF of thi* 
instant iiwetttion Suth piomoters need not be of plant origin !■' or vanipie prnmoteis denied fiom plant viiuses, such 
Ah the- r aM\< 15S pioinoti : '! oi fiom Aqtobactt-hW" tw"&fac>&ni huch as the T-DNA promoters can b<* plant pic moters 
Atypical eA^mple of a pl^nt piomoter of plant ongin is the incize tibtqtntin-1 (ubi-1 ipmmot-^r kno^n to those of skill 
[0062] Ftomoter The term ptomoter" as used herein refers to a region of sequence determinants located 

■to upstream from the btartottiansaiption ot? q^ne and v^hich ai<* tn\ol\etf in ttx^nition and binding of RNA polymerase 
and other protein 1 , to initiate ana mooul : ik tiansctif.tic n A b : is : il pronioiei ti, the irimimal sequence neoes^ary kt 
assembly of a transcription complex required for transcription initiation. Basal promoters frequently include a TATA 
box'" element usually located between 15 and 35 nucleotides upstream from the site of initiation ot transcription. Basal 
promoters also sometimes inoluoe a OCA AT bo>" element 'typically a i equen~.e OCAA'I i and,oi d GGGC G sequence 

45 tonally loc- tied betvieen 40 and 200 nucleotides preferably fiO to 120 nucleotides upstttam ftons the start site of 
transcription. 

[0063] Public sequence The term public seouence ," as used tn the tonte^t otthe instant appiioation iefeisto 
any sequente that nas been deposited in a publicly' accessible d..it.ibase This term encompasses both amino acn 
and nucleotide sequences Sncn sequences are publicly accessible for evample on the BLAST databases on the 
Jtf NCBI FTP o/eb sift u^cc^ssiL l<* at n?l 1 nlin gov/L la^t^ The database at th<= NC8I GTP site utilces gi N numbei s assigned 
b\ NC Bl .js a unique identifiei for each sequence in the d.tt.jbases theteb\ piovidinq .t non redund.tnt ti^tab.jse foi 
sequence from ^Mtious databases inducing GenBank EMBL DBBJ iDNA Database of Japan) and PDB (Btoolmaven 
Protein Data Bank). 

[0064] Regulatory Sequente The leim regulatory sequence- " as u^^d in tht cuirenl iru'ention refers to ariy 
ss nucleotide sequence that influences transcription or translation initiation and rate, and stability and/or mobility of the 
transcnptot polypeptide proouct Regulatory sequences include but are not limited to ptomoters piomotet contiol 
elements pi ot<;in binding s^qu^ni^s f ' and 3 nTP*. tt^ns-iiption^l «tart sit<r termination lequtrnce polyadenvlation 
sequence introris certain ^^qu^nces within a todmg s^qu^nce etc 
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[0065] Related Sequences Related sequences" refet to eitner a polypeptide or a nucleotide sequence that 
e<hibits borne d^gi^e ot sequence similarity with a sequence de^aibed b> th<= REF and SE'j tables 
[0066] Scaffold Attachment Region (bARt As, used herein scaffold : itUKhment region" is a DNA sequence that 
tjiuhftss thfjmatin to the nu.iear matrix or s:affold to geneiate bop domains that cm ha^'e eithw a tianscnpttonalh 
s active or inactive structure {Spiker and Thompson s1998) Plant Physioi. 110: 15-21 >. 

[0067] oequetuv-detennmed DNA fragments tSDf-'s} Sequence-oetei mined DNA fragments" as useo tn the 
curmnt invention isolated s<= quern.es uf genes fragm-mis of gun> j s int> j rgemc mgions or contiguous DNA from 
plant genomic DNA or cDNA or RNA the sequence of which has been determined, 

[0068] Signal Peptide ^ signal peptide ' as used in the ^unent invention is an ammo acid sequence that targets 
10 the piotem fa section foi hansf ortto an intra cell ulai compartment ot oig : melk orfoi iru'orpoiation into a membrane 
Sign i! peptides an* indicated tn th# tables and i mum d<= tailed description iocat-Ki below 

[0069] Sp^ittc Promote! in the content of the cun^nt [mention specific pa motes' i^rs to a buLstt ot induc- 
ible prunoters that have .t high fteteienoe foi being induced in a s pacific tissue or tell and/ot ntt .j ■specific time during 
development ot an organism B\ high pieferenie" is meant at l^ast 5-fob ttefetafcly 5-folo moie pieferobly at leust 

»5 10-foid stil! more preferably at least 20-fold, 50-fold or 100-fold increase m transcription in the desired tissue over the 
transcription in any othei tissue Typical examples of temporal and/or tissue specific promoters of plant ongm that tan 
be used with the polynucleotides of the present indention .jre F !A29 a promotet which is capable tf driving gene 
tianscnption specifically m tapetum and only dutmg anther development ^koltonow et al P.am Cen Z 1201 tjuyOi 
RCc2 and RCc3, promoters that direct root-specific gene transcription in nee (Xu eta!,. Plant Mo!- Biol. 27:237 (1995): 
TobRB27 : i ioot-i,f eafK piomcUi fiuiri k Ixkco i Yarnamoto etal PiantC&"3 371 i 1 991 t) E *amt_ tes of tissue-specific 
ptomntws under developmental :ontiol include ttomoters tnot initiate ttans:riptn 5 n •jnlv in :ertam tissues or organs 
such as toot ovule fiutt seeds or flowers Othei suitable promote! s tnciuoe tnose from genes encoding storage 
protninb oi thf lipid body inembiann pattin deObin a f*^ loot-sp^ufk promoteis ate noted atx v<= 
[0070] Stringency "Stringency" as us* d h<= tern is ;< function of prob* length ptohe composition t G + C content) 
and salt concentration organic solvent concentration and temperature of hybridization or wash conditions Stringency 
is t>pi^aiiy ^ompaied by the paiameter '!,„ which is the tempeiatuie at which 6v% ot the complementary molecules 
in the hybudi.ralion are hybridised in letrns. of a iempsr-iature differential from T„, High stnngerKy conditions are those 
providing a condition of T m - 5"C to T m - 10*C Medium or moderate stringency conditions are: those providing T m - 
20'CtoT fn ~29 :: C. Low stringency conditions are those providing a condition ofT m -40"Cto T m -48*C The relationship 

30 of hybridization conditions to T m (in v C) is expressed in the mathematical equation 



T m = 81.5 -16.6(log 10 [N8' ]) + 0.41{%G+C) - (600/N) {1 i 

35 where N is the length of the probe. This equation works well for probes 1 4 to 70 nucleotides in length that are identical 
to the target sequence. The equation below for T m of DNA-DNA hybrids is useful for probes in the range of 50 to greater 
than 500 nucleotides, and for conditions that include an organic solvent (forma mide). 

40 T ff| - 81.5+16.8 log flNa^l+OJINa*]}}* 0.41 (%G+C)-500/L 0.63(%formamide) (2) 

where L is the length of the probe in the hybrid. (P. Tijessen, Hybridization with Nucleic Acid Probes" in Laboratory 
Techniques in Biochemistry and Molecular Biology, PC. vand der Vltet. ed . c. 1983 by Cvlsevier. Amsterdam.) The T m 
of equation (2) is affected by the nature of the hybrid; for DNA-RNA hybrids T m is lO-IS^C higher than calculated, for 
4$ RNA-RNA hybrid? T m is 20-25'-C higher Because the T m decreases about 1 "C for each 1% decrease m homology 
when a long probe is used (Bonner et al.,J. Moi. Biol. 81 ;1 23 {1973}), stringency conditions can be adjusted to favor 
detection of identical genes or related family members. 

[0071] Equation i2) is denved assuming equilibrium and therefore, hybridizations according to the present invention 
are most preferably performed under conditions of probe excess and for sufficient time to achieve equilibrium The 
50 time required to reach equilibrium can be shortened by inclusion of a hybridization accelerator such as dexttan sulfate 
or another high volume polymer in the hybridization buffer. 

[0072] Stringency can be controlled during the hybndization reaction or after hybridization has occurred by altering 
the salt and temperature conditions of the wash solutions used. The formulas shown above are equally valid when 
used to compute the stringency of a wash solution. Preferred wash solution stringencies lie within the ranges stated 
ss above; high stringency is 5-8*C below T„, medium or moderate stringency is 26-29 ;, C below T r „ and low stringency is 
45-48* C below T m . 

[0073] Substantially free of A composition containing A is substantially free of B when at least 85% by weight 
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of the total A+B tn the composition is A. Preferably, A comprises at least about 90% by weight of the total of A+B in 
the composition, more preferably 3t least about 95% or even 99% by weight. For example, a plant gene or a DNA 
sequence can be considered substantially free of other plant genes or DNA sequences 

[0074] Transnational start site In the context of the current invention, a translations! start site" is usually an ATG 
s in the cONA transcript, more usually the first ATG A single cDNA, however, may have multiple translationa! start sites 
[0075] Transcription start site Transcription start site" is used in the current invention to describe the point at 
which transcription is initiated. This point is typically located about 25 nucleotides downstream from a TFIID binding 
site, such as a TATA box. Transcription can initiate at one or more sites wrthrn the gene, and a single gene may have 
multiple transcriptional start sites, some of which may be specific for transcription in a particular cell-type or tissue. 
10 [0076] Untranslated region (UTRj A UTR" is any contiguous series of nucleotide bases that is transcribed, but 
is not translated. These untranslated regions may be associated with particular functions such as increasing mRNA 
message stability Examples of UTRs include, but are not limited to polyadenylation signals, terminations sequences, 
sequences located between the transcriptional start site and the first exon (5' UTR) and sequences located between 
the last exon and the end of the mRNA (3' UTR). 
'5 [0077] Variant' The term variant" is used herein to denote a polypeptide or protein or polynucleotide molecule 
that differs from others of its kind in some way. For example, polypeptide and protein variants can consist of changes 
in amino acid sequence and/or charge and/or post-translational modifications {such as glycosyiaiion, etc). 

DETAILED DESCRIPTION OF THE INVENTION 

I. Polynucleotides 

[0078] Exemplified SDFs of the invention represent fragments of the genome of corn, wheat, rice, soybean or Ara- 
bidop&is and/or represent mRNA expressed from that genome The isolated nucleic acid of the invention also encom- 
25 passes corresponding fragments of the genome and/or cDNA complement of other organisms as described in detail 
beiow, 

[0079] Polynucleotides, of (he invention can be isolated from polynucleotide libraries using primers comprising se- 
quence similar to those described by the REF and SEQ Tables See. for example, the methods described in Sambrooi* 
et a!. t supra. 

30 [0080] Alternatively, the polynucleotides of the invention can be produced by chemical synthesis Such synthesis 
methods are described below, 

[008 1] It is contemplated that the nucleotide sequences presented herein may contain some small percentage of 
errors. These errors may arise in the normal course of determination of nucleotide sequences Sequence errors can 
be corrected by obtaining seeds deposited under the accession numbers cited above, propagating them, isolating 
35 genomic DNA or appropriate mRNA from the resulting plants or seeds thereof, amplifying the relevant fragment of the 
genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, and sequencing the 
amplification product. 

I.A. Probes, Primers and Substrates 

[0082] SDFs of the invention can be applied to substrates for use in array applications such as, but not limited to, 
assays of global gene expression, for example under varying conditions of development, growth conditions. The arrays 
can also be used in diagnostic or forensic methods {WO95/35505, US 5,445.943 and US 5.410,::/'0t 
[0083] Probes and primers of the instant invention will hybridize to a polynucleotide comprising a sequence in REF 

■*s and SEQ TABLES 1 AND 2. Though many different nucleotide sequences can encode an amino acid sequence, the 
sequences of REF and SEQ TABLES 1 AND 2 are generally preferred for encoding polypeptides of the invention 
However, the sequence of the probes and/or primers of the instant invention need not be identical to those in REF and 
SEQ TABLES 1 AND 2 or the complements thereof For example, some variation in probe or primer sequence andtor 
length can allow additional family members to be detected, as well as orthoiogous genes and more ta.<onomically 

so distant related sequences. Similarly, probes and/or primers of the invention can include additional nucleotides that 
serve as a label for detecting the formed duplex or for subsequent cloning purposes. 

[0084] Probe length will vary depending on the application. For use as primers, probes are 12-40 nucleotides, pref- 
erably 18-30 nucleotides long. For use in mapping, probes are preferably 50 to 500 nucleotides, preferably 100-250 
nucleotides iong. For Southern hybridizations, probes as long as several kilobases can be used as explained below. 
55 [0085] The probes and/or primers can be produced by synthetic procedures such as the triester method of Matteucci 
et al J. Am. Chem. Soc. 103.3 185f 1981}; or according to Urdea et ai. Proc Natt. Acad. 807461 (1981) or using 
commercially available automated oligonucleotide synthesizers. 
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LB. Methods of Detection and . Isolation 

[0086] The polynucleotide.* of the invention can be- utilized tn a number of methods known to those- skilled in the art 
as probes and/or primers to isolate and detect: polynucleotides, including, without limitation: Southerns. Northerns, 
s Branched DMA hybridization assays, polymerase chain reaction, and microarray assays, and variations thereof. Spe- 
cific methods given by way of example?., and discussed below include: 

Hybridization 
Methods of Mapping 
10 Southern Blotting 

Isolating cDNA from Related Organisms 
Isolating and/or Identifying Orthologous Genes 

Also, the nucleic acid molecules ot the invention can used in other methods, such as high density oligonucleotide 
is hybridizing assays, described, for example, in U.S. Pat Nos. 6,004,753; 5,945,306: 5,945,287; 5,945.308; 5,919,886; 
5~919.661; 5,919.627: 6.874,248; 5.871.973; 5,871 971 . and 6.871,930; and PCT Pub. Nos. WO 9946380; WO 
9933981; WO 9933870; WO 9931252; WO 9915658; WO 9906572; WO 9858052; WO 9958672; and WO 9810858. 

B.1. Hybridization 

[0087] The isolated SDFs of REF and SEO TABLES 1 AND 2 of the present invention can be used as probes and/ 
or primers for detection and/or isolation of related polynucleotide sequences through hybridization. Hybridization of 
one nucleic acid to another constitutes a physical property that defines the subject SDF of the invention and the identified 
related sequences. Also, such hybridization imposes structural limitations on the pair. A good general discussion of 
25 the factors for determining hybridisation conditions is provided by Sambrook et at. ("Molecular Cloning, a Laboratory 
Manual. 2nd ed., c 1989 by Cold Spring Harbor Laboratory Press Cold Spring Harbor. NY; see asp., chapters 11 and 
12), Additional considerations and details of the physical chemistry of hybridisation are provided by G.H. Keller and 
MM. Manak DNA Probes", 2 r,rf Ed. pp. 1-25, c, 1993 by Stockton Press, New York, NY, 

[0088] Depending on the stringency of the conditions under which these probes and/or primers are used, poiynucie- 
30 otides exhibiting a wide range of similarity to those in REP and SEQ TABLES 1 ANL> 2 can be detected or isolated 
When the practitioner wishes to examine the result of membrane hybridizations under a variety of stringencies, an 
efficient way to do so is to perform the hybridization under a low stringency condition, then to wash the hybridization 
membrane under increasingly stringent conditions. 

[0089] When using SDFs to identity orthologous genes in other species, the practitioner will preferably adjust the 
35 amount of target DNA of each species so that, as nearly as is practical, the same number of genome equivalents are 
present for each species examined This prevents faint signals from species having large genomes, and thus small 
numbers of genome equivalents per mass of DNA, from erroneously being interpreted as absence of the corresponding 
gene in the genome. 

[0090] The probes and/or primers of the instant invention can also be used to detect or isolate nucleotides that are 
40 identical" to the probes or primers Two nucleic acid sequences or polypeptides are said to be "identical" if the sequence 
of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned tor maximum 
correspondence as described below. 

[0091] Isolated polynucleotides wrthm the scope of the invention also include allelic variants of the specific sequences 
presented in REF and SEQ TABLES 1 AND 2 The probes and/or primers of the invention can also be used to detect 
■*s and.'or isolate polynucleotides exhibiting at least 80% sequence identity with the sequences of REF and SEQ TABLES 
1 AND 2 or fragments thereof. 

[0092] With respect to nucleotide sequences, degeneracy of the genetic code provides the possibility to substitute 
at least one base of the base sequence of a gene with a different base without causing the amino acid sequence of 
the polypeptide produced from the gene to be changed Hence, the DNA of the present invention may also have any 
so base sequence that has been changed from a sequence in REF and SEQ TABLES 1 AND 2 by substitution in accord- 
ance with degeneracy of genetic, code References describing codon usage include. Carets, ef ai . J. Mol. Evoi. 46: 45 
{ 1 998} and Fennoy ef a! , Nuci. Adds Res. 21(23 ?: 5294 f 1 993) 

B.2. Mapping 

ss 

[0093] The isolated SDF DNA of the invention can be used to create various types of genetic and physical maps of 
the genome of corn. Arabidopsis. soybean, rice, wheat, or other plants Some SDFs may be absolutely associated 
with particular pheriotypic traits, allowing construction of gross genetic, maps. While not ail SDFs will immediately be 
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associated with a phenotype all SDFs can be used as piobes for identifying polymorphisms associated with phenotypes 
of interest Briefly, one method of mapping involves total ON A isolation from individual? It is subsequently c leaved with 
one or more restriction enzymes, separated according to mass., transferred to a solid support, hybridized with SDF 
DNA and the pattern of fragments compated Polymorphisms associated with a particular SDF ate visualized as dif- 

s> fete-nces in the size- of fragments produced between individual DNA samples aftet digestion -Attn a particular restriction 
enzyme and hybridization with the SDf : . After identification of polymorphic SDF sequences linkage studies can be 
conducted By using the individuals shoeing polymorphisms as parents in crossing programs F2 progeny recombinants 
or recombinant inbreds, for example, are then analyzed. The order of DNA polymorphisms along the chromosomes 
can be determined based on the frequency with which they are inherited together versus independently. The closer 

to two polymorphisms are together in a chromosome the higher the probability that they are inherited together. Integration 
of the relative positions of all the polymorphisms and associated market SDFs can produce a genetic map of the 
species where the distances between markers reflect the recombination frequencies in that chromosome segment 
[0094] The use of recombinant inbred lines fot such genetic mapping is described fot Aiabidopas by Alonsa-Bianco 
etal (Methods in Molecular S'otog> vol 82, Arabidopsis Protocols" pp 137-146 JM tViartinez-Zapater and J. Salinas 

*s eds , c. 1098 by Humana Press, Totowa, NJ) and for corn by Burr f Mapping Genes with Recombinant inbreds", pp. 
249-254. In Fr&eiing. M. and V. Walbot (Ed.), The Maize Handbook, c. 1994 by Sprmger-Verlag New York, Inc.: New 
York. NY. USA Beilin Get many, Burr et al Genetics (1998 1 118. 519. Gardiner. J. et ai., ,1993) Genetics 134 9! 7) 
This procedure, howevei, ts not limited to plants and can be used for other organisms (such as yeasti orfoi individual 
cells. 

so [0095] The SDFs of the present invention can also be used tor simple sequence repeal iSSRI mapping Rice SSR 
mapping is described by Morgante et al. {The Plant Journal (1993) 3 165) Panaud et a I {Genome (19955 38 1 I7QI 
Senior etal {Crop Science 1 1996) 36 1676), Taramino et ai ( Genome (1996) 39 277} and Ahn et ai (Molecular and 
Genera! Genetics 1 1993} 24 1 483-90} SSR mapping can be achieved using various methods In one instance, poly- 
morphisms are identified v\hen sequence specific probes contained within an SDF flanking an SSR are made arid used 

25 in polymerase chain reaction <PCR) assays with template DNA from two or more individuals of interest. Here, a change 
in the number of tandem repeats between the SSR -flanking sequences produces differently sized fragments fU S. 
Paten) 766,847} Alternatively, polymotphisms can be identified hv ustng the PGR fragment produced from the SSR- 
flanking sequence specific primer reaction as a probe against Southern blots representing different individuals (U H 
Refseth et ai . t l997) Electrophoresis 13 1519) 

30 [0096] Genetic and physical maps of ctop species have many uses Fot example, these maps can be used to devise 
positional cloning strategies for isolating novel genes from the mapped ctop species In addition because the genomes 
of closely related species are largely syntonic {that is. they display the same ordering of genes within the genome), 
these maps can be used to isolate novel alleles from relatives of crop species by positional cloning stiategies 
[0097] The various, types of maps discussed above can be used with the SDFs of the invention to identify Quantitative 

35 Trait Loci (QTLs't Many important ctop traits, such as the solids content of tomatoes, are quantitative ttaits and result 
from the combined interactions of several genes. These genes reside at different loci in the genome, oftentimes on 
different chromosomes, and generally exhibit multiple alleles at each locus The SDFs of the invention can be used to 
identify QTLs and isolate specific alleles asdescnbed by de Vicente and Tanksiey i Genetics 134 58? ( 199?t) in addition 
to isolating QTL alleles in present crop species the SDFs of the invention can also be used to isolate alleles from the 

40 corresponding QTL of wild relatives. Transgenic plants having various combinations of QTL alleles can then be created 
and the effects of! he combinations measured Once a desired allele combination has been identified crop improvement 
can be accomplished either through biotechnologies! means or by directed conventional breeding programs (for review 
see Tanksiey and MoCouoh. Science 1063 (199; p ii 

[0098] in another embodiment, the SDf-'s can be used to help create physical maps of the genome of com, Arabi- 
cs dopsis arid related species Where SDFs have been ordered on a genetic map as described above, they can be used 
as probes to discover v..hich clones in large libianes of plant DNA fragments in YACs, BACs, etc contain the same 
SDF or similar sequences, thereby facilitating the assignment of the large DNA fragments to chromosomal positions 
Subsequently the laige BACs. YACs etc can be oideted unambiguously by more detailed studies of then sequence 
composition te g Marra et al 1 1997} Genomic Research 7- 1072-1 084 1 and by using their end or other sequences to 
50 find the identical sequences in othei cloned DNA fragments The overlapping of DNA sequences in this way allows 
large contigs of plant sequences to be built that, when sufficiently extended, provide a complete physical map of a 
chiomosome Sometimes the SDFs themselves will piovide the means of joining cloned sequences into a contig 
[00993 The patent publication WO93'35505 and U S Patents 5.44S.943 and 5.4 10.270 desctibe scanning multiple 
alleles of a plurality of loci using hybridization to arrays of oligonucleotides These techniques are useful for each ot 
55 the types of mapping discussed above. 

[0100] Following the procedures described above and using a plurality of the SDFs of the present invention any 
individual can be genotyped. These individual genotypes can be used for the identification of particular cultivars. va- 
rieties lines, ecotypes and genetically modified plants or can serve as tools for subsequent genetic studies involving 
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multiple phenoty pic traits. 

B.3 Southern Blot Hybridization 

* [010 1j The- sequences from REF and SEQ TABLES 1 AND 2 can be used as probes for various nyondization tech- 
niques These techniques are useful for detecting target polynucleotides in a sample or tor determining whether trans- 
genic plants ^eudsot ho^t fulls haibor a gene ui s>K{Ut=nrt; of interest and thus might be expected to exhibit i particulai 
tratt or phenoty pe. 

[0102] In addition the SDFs from the invention can he used to isolate additional membeis of gene families fiomthe 
10 s?ime of different species arid or Githoloqous gtines from ihe sjmd or different ".psoitis This ^ accomplished by h>- 
hndcing an SDF to for example a Southern blot cent lining the appropriate genomic DNA ui cDN-i GK-en the- lusultina 
hybridization data, one of ordinary skill in the art could distinguish and isolate the correct DMA fragments by size, 
restriction sites, sequence and stated hyt tidization conditions tiom a gel oi tiom a iibr.jiy 

[0103] Identification and isolation of orth^bgnus genes ftcm cbsely related sp^ies ano alleles within a species is 
*s particularly desirable because of their potential for crop improvement Many important crop traits, such as the solid 
content of tomatoes, result from the combined interactions of the products of several genes residing at different loci m 
the genome Generally alleles at each of these Ion can make quantitative diffeiences k the tiait Bv identifying and 
isolating numerous alleles for each locus from within or different species, transgenic plants with various combinations 
of alleles can be created and the effect-, of the combinations measuied Once a moie tavoiable allele combination has 
J- been identified ciop improvement can be accompli '.he d etfhc- i through bioltichnological means c i bv directs d conven- 
tional breeding programs {T,jnksk*y et al Scarce 2^7 1063(19975' 

[0104] Tne results from hybridisations of the SOFs of the invention to fot example Soutnern blots containing DNA 
tiom anothei species can 3lsO Le ut.ed to geneiate restriction tngment rmpb foi the cot responding genomic legions 
The-se- maps ptocide additional information about the- telativu positions of restriction sties c^ithin fragments further 

25 distinguishing mapped DNA from the remainder of the genome. 

[0105] Phvsical maps can be made by digesting genomic DNA vuth difteient combinations of lestuctton en7vme? 
[0106] Probes foi Southern blotting Jo distinguish individual restnetion fragments can range in *>i::tr fioin 16 to /0 
nucleotides to several thousand nucleotides More preteiably the probe is 100 to 1 000 nucleotides lung for identifying 
membeis ota genefaiml\ when it is found tnat repetitive sequences y.ould complicate the hybridization Fondentifv'tng 

JO at) entire cone-,} ending gene in anoihej s( ecies tht piobe is mote preferably the Ic- ngiti of the gtine tvpically z 000 
tf 10 000 nucleotides but piooes 50-1 000 nucleotides Nig might oe used Some genes however might lequire 
probes up to 1,500 nucleotides long or overlapping probes constituting the full-length sequence to span their lengths. 
[0107] Also v^hile it is piefened that the piobe b<= homogeneous with iespect k its sequence it is not necessity 
f-'oi example as desenhed below a piobe leptesentmg memters of .t gene family having dhetie sequences can be 
gentiate-d using PCR to ampliU genomic DNA or RNA templates using ptunets derived ftom SDFs that include se- 
quences that define the gene family, 

[0108] For identifying ccdie sc ending genes in anothe-i specie-, the not most pieterabie \ rcbe is : i >'DNA spanning 
the f-ntue coding sequence whim allocs ^ii of the mRNA-codmg fragment of the gene to be identifieo Probes foi 
Southern Plotting can easily be geneiated from SDFs bv' making primers having the sequence at the ends of the SDF 

■to and using _xinu At ^bidopsis gene mi.: DNA as a template In instances v,hete the t>DF include^, sequence conserved 
among species primes including the conse-fved sequence can be used lot PCR v\ith genomic DNA lioin a spt:cit:s of 
interest to obtain a probe. Similarly; if the SDF includes a domain of interest, that fragment of the SDF can be used to 
make primers and. with appropriate template DNA. used to make a probe to identify genes containing the domain. 
Alternatively the PCR produots can be teioh'ed toi e«..jmple by gel electiophc tests and cloned and/oi sequenced 

45 Usitia Southern hybridisation the variants of the domain ainona members ot a gene family both vMihm and actoss 
species, can be examined. 

isojating DNA from R 

so [0109] The SDFs of the invention can be used to isolate the corresponding DNA from other organisms Either cDNA 
or genomic DNA can be isolated For isolating genomic DNA. a lambda, oosmici, BAG or YAC. or other large insert 
genomic library from the plant of interest can be constructed using standard molecular biology techniques as described 
in detail by Sambrook et al. 1989 {Molecular Cloning; A Laboratory Manual, 2 nd ed. Cold Spring Harbor Laboratory 
Press, New York) and by Ausubel et al. 1992 (Current Protocols in Molecular Biology, Greene Publishing, New York). 

55 [0110] To screen a phage library, for example, recombinant lambda clones are plated out on appropriate bacterial 
medium using an appropriate £, ooh host strain The resulting plaques are lifted from the plates using nylon or nitro- 
cellulose filters. The plaque lifts are processed through denaturation, neutralization, and washing treatments following 
the standard protocols outlined by Ausubel et al (1992) The plaque lifts are hybridized to either radioactively labeled 
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or non-radioactively labeled SDF DNA at room temperature for about 16 hours, usually in the presence of 50% forma- 
mide and 5X SSC (sodium chloride and sodium citrate) buffer and blocking reagents. The plaque lifts are then washed 
at 42°C »ltt) 1% Sodium Dodeeyl Sulfate (SDS) and at a particular concentration of SSC The SSC concentration used 
is dependent upon the stringency at which hybridization occurred in the initial Southern blot analysis performed. For 

s example, if a fragment hybridized under medium stringency (e.g., Tm - 20 C 'G), then this condition is maintained or 
preferably adjusted to a less stringent condition (e.g . Titk-JO-C} to wash the plaque lifts. Positive clones show detect- 
able hybridization e.g.. by exposure to X-ray films or chromogen formation. The positive clones are: then subsequently 
isolated for purification using the same genera! protocol outlined above. Once the clone is purified, restriction analysis 
can be conducted to narrow the region corresponding to the gene of interest. The restriction analysts and succeeding 

10 subeloning steps can be done using procedures described by, for example Sambrook et al (1989) cited above 

[0111] The procedures outlined for the lambda library are essentially similar to those used for YAC library screening, 
except that the YAC clones are harbored in bacterial colonies The YAC clones are plated out at reasonable density 
on nitrocellulose or nylon filters supported by appropriate bacterial medium in petti plates. f : ©flowing the growth of the 
bacterial clones, the filters are processed through the denaturation. neutralization, and washing steps following the 

»5 procedures of Ausubel et al. 1992. The same hybridization procedures for lambda library screening are followed. 

[0112] To isolate cDNA, similar procedures using appropriately modified vectors are employed. For instance, the 
library can be constructed in a lambda vector appropriate for c.ioning cDNA such as >„gt11 Alternatively, the cDNA 
library can be made in a plasmid vector. cDNA for cloning can be prepared by any of the methods known in the art, 
but is preferably prepared as described above. Preferably, a cDNA library will include a high proportion of full-length 

so clones. 

B. 5. isolating and/or Identifying Orthoiogous Genes 

[0113] Probes and primers of the invention can be used to identify and/or isolate polynucleotides related to those in 

25 REF and SEQ TABLES 1 AND 2. Related polynucleotides are those that are native to other plant organisms and exhibit 
either similar sequence or encode polypeptides with similar biological activity. One specific example is an orthoiogous 
gene. Orthoiogous genes have the same functional activity. As such, orthoiogous genes may be distinguished from 
homologous genes The percentage of identity is a function of evolutionary separation and, in closely related species, 
the percentage of identity can be 98 to 100%. The amino acid sequence of a protein encoded by an orthoiogous gene 

30 can be less than 75% identical, but tends to be at leastr£% or at least 80% identical, more preferably at least 90%. 
most preferably at least 95% identical to the amino acid sequence of the reference protein To find orthoiogous genes, 
the probes are hybridized to nucleic acids from a species of interest under low stringency conditions, preferably one 
where sequences containing as much as 40-45% mismatches will be able to hybridize. This condition is established 
by T m ■ 40X to T„, ■ 48*0 (see below) Blots are then washed under conditions of increasing stringency it is preferable 

35 that the wash stringency be such that sequences that are 85 to 100% identical will hybridize. More preferably, sequences 
90 to 100% identical will hybridize and most preferably only sequences greater than 95% identical will hybridize. One 
of ordinary skill m the art will recognize that, due to degeneracy in the genetic code, ammo acid sequences that are 
identical can be encoded by DNA sequences as little 3S 67% identical or less Thus : it is preferable, for example, to 
make an overlapping series of shorter probes, on the order of 24 to 45 nucleotides, and individually hybridize them to 

40 the same arrayed library to avoid the problem of degeneracy introducing large numbers of mismatches. 

[0114] As evolutionary divergence increases, genome sequences also tend to diverge Thus, one of skill will recog- 
nize that searches for orthoiogous genes between more divergent species will require the use of lower stringency 
conditions compared to searches between closely related species. Also, degeneracy of the genetic code is more of a 
problem for searches in the genome of a species more distant evolution anly from the species that is the source of the 

4$ SDF probe sequences. 

[0115] The SDFs of the invention can aiso be used as probes to search for genes that are related to the SDF within 
a species Such related genes are typically considered to be members of a gene family, in such a case, the sequence 
similarity will often be concentrated into one or a few fragments of the sequence. The fragments of similar sequence 
that define the gene family typically encode a fragment of a protein or RNA that has an enzymatic or structural function. 

so The percentage of identity in the amino acid sequence of the domain that defines the gene family is preferably at least 
70%, more preferably 80 to 95%, most preferably 85 to 99%. To search for members of a gene family within a species, 
a low stringency hibndization is usually performed, but this will depend upon the size, distribution and degree of se- 
quence divergence of domains that define the gene family. SDFs encompassing regulatory regions can be used to 
identify coordinately expressed genes by using the regulatory region sequence of the SDF as a probe 

55 [0116] In the instances where the SDFs are identified as being expressed from genes that confer a particular phe- 
notype. then the SDFs can also be used as probes to assay plants of different species for those phenotypes 
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!.C. Methods to inhibit Gene Expression 

[0117] The- nucleic acid molecules of ihe presenl invention can be used to inhibit gene transcription andfar translation 
Example of such methods include, without iimttation: 

s 

Antisense Constructs; 
Ribozyme Constructs, 
Chimeraplast Constructs; 
Co-Suppression; 
10 Transcriptional Silencing, and 

Other Methods of Gene Expression. 

C. I Antisense 

?s [0116] in some instances it is desirable to suppress expression of an endogenous or exogenous gene. A well-known 
instance is the FLAVOR-SAVOR" tomato, in which the gene encoding ACC synthase is inactivated by an antisense 
approach, thus delaying softening of the fruit after ripening See for example. U S Patent No 5 859.3^0, U S Patent 
No 5 723.766. Oelier, et al. Science. 254 437~439( 1991), and Hamilton etal Nature. 346 284-287 <i990) Also, timing 
of flowering can he controlled by suppression of the FLOWERING LOCUS C [FLO), high levels of this transcript are 

so associated with late flowering white absence of FLC is associated with early flowering f S D Michaels el al Flan! Cell 
11 949 i 1999) Also the tiansition of apical menstem from production of leaves with associated shoots to flowering is 
regulated by TERMINAL FLOWER1 APETALAl and LEAFY Thus, when it is desired to induce a ttansition from shoot 
production to flowering, it is desirable to suppress TFL1 expression (S.J. Liljegren. Plant Ceil 11:1007 (1999)). As 
another instance, arrested ovule development and female sterility result from suppression of the ethylene forming 

25 enzyme but can be reversed by application of ethylene t'D. Oe Martinis et al., Plant Cell 11 ; 1 061 (1999)). The ability 
to manipulate female fertility of plants is useful in increasing fruit production and creating hybrids. 
[0119] In the case of polynucleotides used to inhibit expression of an endogenous gene, the introduced sequence 
need not be perfectly identical to a sequence of the target endogenous gene The introduced polynucleotide sequence 
wili typically be at least substantially identical to the taiget endogenous sequence 

30 [0120] Some polynucleotide SDFs in REF and SEQ TABLES 1 AND 2 represent sequences Mat are expressed in 
coin.wheat, nee. soybean, Arabidopsis and/or othei plants Thus the invention includes using these sequences to gen- 
erate antisense constructs to inhibit translation and/or degradation of transcripts of said SDFs. typically in a plant cell. 
[0121] To accomplish this, a polynucleotide segment from the desired gene that can hybndce to the mRN A expressed 
from the desired gene ithe .antisense segment"} is operably linked to a piomoter such that the antisense stiand of RNA 

35 will be transcribed when the construct is present in a host cell A requited promoter can be used in the construct to 
control transcription of the antisense segment so that transcription occurs only under desired circumstances 
[0122] The antisense segment to be ml reduced generally will be substantially identical to al least a fragment of the 
endogenous gene or genes to be repressed The sequence, ho^evet, need not be perfectly identical to inhibit expres- 
sion Further, the antisense product may hybridize to the untranslated region instead of or m addition to the coding 

■to sequence of the gene The vectois of the present invention can be designed such that the inhibitory effect applies to 
other proteins within a family of genes exhibiting homology or substantial homology to the target gene. 
[0123] For antisense suppression, the introduced antisense segment sequence aiso need not be full length relative 
to either the primary transcription product or the fully processed mRNA Generally a higher percentage of sequence 
identity can be used to compensate foi the use of a shortei sequence Furthermore, the intioduced sequence need 

■*s not have the same intfon or exon pattern arid homology of noncoding segments may be equally effective Normally 
a sequence of between about 30 or 40 nucleotides and the full length of the transcript can be used though a sequence 
of at least about 100 nucleotides is preferred, a sequence of at least about 200 nucleotides is more preferred, and a 
sequence of at least about 500 nucleotides is especially preferred 

so c.2. Ribozymes 

[0124] it is also contemplated that gene constructs representing ribozymes and based on the SDFs in REF AMD 
SEQ TABLES 1 AND 2 are an object of the invention. Ribozymes can also be used to inhibit expression of genes by 
suppiessmg ihe translation of the mRNA inlo a polypeptide it is possible to design ribozymes that specifically pair with 
55 virtually any target RNA and cleave the phosphodiesler backbone at a specific location, thereby functionally inactivating 
the target RNA In carrying out this cleavage the nbozyme is not itself alteied. and is thus capable of recycling and 
cleaving other molecules, making itatfueenzyme The inclusion of nbo.7yme sequences with in antisense RN As confers 
RNAcleavmg activity upon them, thereby increasing the activity ot the constructs 
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[0125] A number of classes of ribozymes have been identified One class of ribozymes ts derived from a number of 
small circular RMAs, which are capable of selfcleavage and replication in plants. The RMAs replicate either alone (viroid 
RNAs) or with a helpei virus (satellite RNAs) Examples include RNAs from avocado sunblotch viroid and the satellite 
RNAs from tobacco nngspot virus, lucerne transient streak virus, velvet tobacco mottle virus, solanumnodiflorum mottle 
s virus and subterranean clover mottle virus The design and use of target RNAspecific ribozymes is described in Haseloff 
etai Nature. 334:585(1988). 

[0126] Like the antisernse constructs above, the nbozyme sequence fragment necessary for pairing m-.ed not be 
identical to the target nucleotides to be cleaved, nor identical to the sequences in REF AND SEQ TABLES 1 AND 2. 
Ribozymes may be constructed by combining the ribozyme sequence and some fragment of the target gene which 

10 would allow recognition of the target gene tnRNA by the resulting ribozyme molecule. Generally the sequence in the 
ribozyme capable of binding to the target sequence: exhibits a percentage of sequence identity with at least 80% : 
preferably with at least 85%, more prefetably with at least 90% and most preferably with at least 95%. even more 
preferably, with at least 96%, 97%, 98% or 99%) sequence identity to some fragment of a sequence in REF AND SEQ 
TABLES 1 AND 2 or the complement thereof The nbozyme can be equally effective in inhibiting mRNA translation by 

»5 cleaving either in the untranslated or coding regions. Generally, a higher percentage of sequence identity can be used 
to compensate for the use of a shorter sequence. Furthermore, the introduced sequence need not have the same 
intron or exon pattern, and homology of non-coding segments may be equally effective. 

C.3. Chimeraplasts 

[0127] TneSDFs^ftne invention such as those descnbeo bvthe REF. .md SEQ Tables can. jlso be used to construct 
chimetapiasts that can be introduced into a ceil to produce at leabt one specific nucleotide change in a sequence 
coi responding to the SCF of the invention A <.him<=iaplabt is an <. Iinonucleotidt compiismn DNA and or RNA that 
specrfkalK hybridizes k» i target region in a manner Witch creates a mismatched base- pjtr This misinakhc-d b isu- 
25 pair signals the cell's repair enzyme machinery which acts on the mismatched region resulting in the replacement, 
insertion or deletion of designated nucieotide(s). The altered sequence is then expressed by the cell's normal cellular 
mechanism*, Chimeiapiasis can be designed to repair mutant genes modifv genes inttoduoe srie-s-peafto mutation*! 
md/or act to interrupt or alter normal aerie- function iUS Pat Nus C 010 30" and 6 004 £04 and FCT Pub No 
W099'5b~2o and WO0£t'07e65j 

C 4 Sense Suppression 

[0128] The SDFs of REF and SEC TABLES 1 AND 2 of the present invention are also useful to modulate gene 
e^ptession by sense i uppressron Sense suppression represents another method of gene suppression by introducing 

35 it |.» ist out tjvugsmous oopv ot fragment of the endogenous sequence to be- suppressed 

[0129] Introduction of expression cassettes in which a nucleic acid is configured in the sense orientation with respect 
k Hie promoter into the chromosome of a plant or t> a self-repiktating virus has been shown to be an effective means 
by which to inouce oegraoation of mRNAs nf target genes For an example of the use of this method to modulate 
expression of endogenous genes see Napoli et al The Plant Cell 2 279 (1950). and U S Patents Nos 5.034,323. 

40 5,231 ,020. and 5,283,184. Inhibition ot expression may require some transcription of the introduced sequence. 

[0130] Fur sense suppression ihe intioduc^d i^qu^nce generally will be substantially identical to the endogenous 
sequence intended to be inactivated. The minima! percentage of sequence identity will typically be greater than about 
65%, but a higher percentage of sequence identity might exert a more effective reduction in the level of normal gene 
products Sequence identity of mote than about «s0% is preferred, though about 95% to absolute identity would be 

<fs most preferred. As with antisense regulation, the effect would likely apply to any other proteins within a similar family 
of genes exhibiting homology oi suostantiai homology to the suppressing sequence 

C.5. Transcriptional Silencing 

to [0131] The nucleic acid sequences of the invention, including the SDFs of REF and SEQ TABLES t AND 2 and 
fragments thereof, contain sequences that can be inserted into the genome ot an organism resulting in transcriptional 
silencing Such regulatory sequences need not be opetatively linked to coding sequences to modulate transcription of 
a gene Specifically a promoter sequence without any other element of a gene can be introduced into a genome to 
transcriptionally silence an endogenous gene isee for e<ample\ Vaucheie-t, H et al <J998} The- Plant Journal 16 

55 651-659; As another example, triple helices can be formed using oligonucleotides based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence theieto The oligonucleotide can 
be delivered to the host cell and can bind to the promoter rn the genome to form a triple helix and prevent transcription. 
An oligonucleotide ot ink-rest is one that can bind to the promoter and black binding of a transcnphon factor to the 
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ptomoter In such a case, the oligonucleotide can be complementary to the sequences of the promote! that interact 
with transcription binding factors. 

0. 6. Other Methods to Inhibit Gene Expression 

s 

[0132] Yet anothet means of suppressing gene expression is to insert a polynucleotide into the gene of mtetest to 
disrupt transcription or translation of the gene. 

[0133] Low frequency homologous recombination can be used to target a polynucleotide insert to a gene by flanking 
the polynucleotide insert with sequences that are substantially similar to the gene to be disrupted Sequences from 
to REF AND SEQ TABLES 1 AND 2, fragments thereof, and subslanhally similar sequence ihereto can be used for 
homologous recombination. 

[0134] In addition, random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of mtetest Azpiroz-I.eehan et at , Ttends in Genetics 13 152 it W) In this method screening for clones from a libtary 
containing random insertions is prefer red to identifying those that have polynucleotides inserted into the gene of inter est 
*5 Such screening can be performed using probes and/or primers described above based on sequences from REF AND 
SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by selecting clones or plants ha^ng, a desired phenotvpe 

1. D, Methods of Functional Analysis 

[0135] The constructs described in the methods under I C above can be used to determine the function of the 
polypeptide encoded by the gene that rs targeted by the constructs 

[0136] Down-regulating the transcription and translation of the targeted gene in the host cell ot organisms, such as 
a plant, may produce phenoiypic changes, as compjred to a wild-type cell or orgjmsm in addition in vitro assays can 
25 be used to determine if any biological activity, such as calcium flux. DMA transcription, nucleotide incorporation, etc., 
are being modulated by the down-regulation of the targeted gene. 

[0137] Coordinated regulation of sets of genes, e g those contributing lo a desired polygenic trait is sometimes 
necessary to obtJin a desired phenotype SDFs of the indention representing transcription jctivation and DNA binding 
domains can be assembled into hybnd tianscnptional activatois These hybrid transcriptional actuators can be used 
30 with their corresponding DNA elements i t e those bound by the DNA-binding SDFs) lo effect coordinated expression 
of desired genes (J J Schwa rz et al . Mol Col! Biol 12 266 (19925 A Martinez et al Mo! Gen Genet 261 546 
(19991). 

[0138] The SDFs of the invention can also be used in the two-hybiid genetic systems to identity networks of protein- 
ptotein interactions ( I. Mc&lister-Henn et al . Methods 1 9 330 (1 999 K J C Hu et al Methods: 2i> 60 1 2000 >. M GolovMn 
35 ei c )i j Biol Ctem 274 36428 .1999 i. K Ichimurjeiai Biociiem Biophys Res Conmi 253 532 (1998 1) The SDFs 
of the invention can also be used tn various expression display methods to identity important protem-ON A interactions 
(e.g. B. Luo et al. J. Mo!. Bio!. 266.: 479 (1997)}. 



I.E. Promoters 

[0139] The SDFs of the invention are also useful as structural or regulatory sequences in a construct for modulating 
the expression of the corresponding gene in a plant or other organism, e.g. a symbiotic bacterium. For example, pro- 
moter sequences associated to SDFs of REF and SE--.Q TABLE- S 1 AND 2 of the present invention can be useful m 
directing expression of coding sequences either as constitutive promoters or to direct expression in particular cell types. 

45 tissues, or organs or in response, to environmental stimuli. 

[0140] With respect to the SDFs of the present invention a promoter is likely to be a relatively small portion of a 
genomic DNA (gDNA) sequence located in the first 2000 nucleotides upstream from an initial exon identified in a gDNA 
sequence or initial ATG" or methionine codon or translational start site in a corresponding cDNA sequence Such 
promoters are more likely to be found in the first 1000 nucleotides upstream of an initial ATG or methionine codon or 

so translational start site of a cDNA sequence corresponding to a gDNA sequence In particular the promoter is usually 
located upstream of the transcription start site T he fragments of a particular gDNA sequence that function as elements 
of a promoter in a plant cell will preferably be found to hybridize to gDNA sequences presented and described in REF 
and REF AND SEQ TABLES 1 AND 2 at medium or high stringency, relevant to the length of the probe and its base 
composition. 

55 [0141] Promoters are generally modular in nature. Promoters can consist of a basal promoter that functions as a site 
for assembly of a transcription complex comprising an RNA polymerase, for example RNA polymerase II. A typical 
transcription complex will include additional factors such as TFjjB, TF !f D, and TF„E. Of these, TF it D appears to be the 
only one to bind DNA directly. The promoter might also contain one or more enhancers and/or suppressors that function 
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as bmdmg sites tor additional transcription factors that have the function of modulating the ieve! of transcription with 
respect to tissue specificity and of transcriptional responses to particular environmental or nutritional factors, and the 
like. 

[0142] Short DNA ■sequences representing omdino sites foi proteins can fcesepar,.it^Tfromea:h othei b\ intervening 
s sequences ofvarying length For example Attntn a particuiarfunctional module protein binding sites may be constituted 
by teutons of i to oO, prefer .tbl\ 10 to <<0 mote prefer jbl\ 10 to 20 nucleotides Within such bind met sites there ate 
typically 2 to C; nucl-^oiidf a that specifically cent i^t amino acids of the nui. i> j ic acid binding piotein Th> j protein binding 
sites are usually separated from each other by 10 to several hundred nucleotides typically by 15 to 15Q nucleotides 
often by 20 to SO nucleotides. DMA binding sites in promoter elements often display dyad symmetry in their sequence. 
to ofte n ete tuenls binding septal diftetent pioteins and/or a plurality of bites that bind the vim> : ' protein will be combined 
in a region of 50 to 1.000 basepairs. 

[0143] Elements that ha**e transcription n=^uhkiy function can be isolated fiom their cot responding end^enous 
gene, or the desired sequence can be synthesized, and recombmed m constructs to direct expression of a coding 
region of a gene in a desired tissue-specific temporal-specific ot other desired manner of indue ibihty or suppression 

*s When hybridizations are performed to identify or isolate elements of a promoter by hybridization to the long sequences 
presented in REF AND SEQ TABLES 1 AND 2, conditions are adjusted to account for the above-described nature of 
promoters For e<ample short probes, constituting the element sought, are pieferabiy used under low temperature 
and/or high salt conditions When long probes, which might include several promoter elements are used low to medium 
stringency conditions are preferred when hybridizing to promoters across species 

so [0144] If a nucleotide sequence of an SDF. or part of the SDF functions as a promoter or fragment of a promoter, 
then nucleotide substitutions, insertions ot deletions that do not substantially affect the binding of televant DNA binding 
proteins would be considered equivalent to the exemplified nucleotide sequence It is envisioned that there are in- 
stances where it is desirable to decrease the binding of relevant DNA binding proteins to silence or down-regulate a 
promoter or conversely to increase the binding of relevant DNA binding proteins to enhance or up-regulaie a promoter 

25 and vice versa. In such instances, polynucleotides representing changes to the nucleotide sequence of the DNA-protem 
contact region by insertion of additional nucleotides, changes to identity of relevant nucleotides, including use of chem- 
ically-modified bases, or deletion of one or more nucleotides are considered encompassed by the present invention. 
In addition, fragments of the promoter sequences described by the REF arid SEQ Tables and variants, thereof can be 
fused with other promoters or fragments to facilitate transcription and/or transcription in specific type of ceils or under 

30 specific conditions. 

[0145] Ptomoter function can be assayed by methods Known in the art, preferably by measunng activity of a reporter 
gene operativeiy linked to the sequence being tested for promoter function. Examples of reporter genes include those 
encoding luciferase, green fluorescent protein &US neo, cat and bar 

[0146] Polynucleotides comprising untranslated (UTR) sequences and intron/e<on junctions are also within the scope 
of the invention UTR sequences include rntrons and 5' ot 3' untranslated regions (5' UTRs or 3' UTRs) Ftagments of 
the sequences shown in REF AND SEQ TABLES 1 AND 2 can comprise UTRs and introiVexon junctions. 
40 [0147] These fragments of SDFs, especially UTRs, can have regulatory functions related to, for example, translation 
rate and mRNA stability Thus, these fragments of SDFs can be isolated for use as elements of gene constructs for 
regulated production of polynucleotides encoding desired polypeptides 

[0148] Introns of genomic EDNA segments might also have regulatory functions Sometimes regulatory elements, 
especially transcription enhancer or suppressor elements, are found within introns. Aiso. elements related to stability 
45 of heteronuclear RNA and efficiency of splicing and of transport to the cytoplasm for translation can be found in mfron 
elements Thus these segments can also find use as elements of expression vectors intended for use to transform 
plants. 

[0149] Just as with promoters UTP sequences and mtton/exon functions can vary fiom those shown in REF AND 
SEQ TABLES 1 AND 2 Such changes from those sequences preferably will not affect the regulatory activity of the 
so UTRs oi mtron/exon junction sequences on expression, transcription, or translation unless selected to do so. However 
in some instances, down- or up-regu let tort of such activity may be desired to modulate traits or phenotypic or/>? vitro 
activity. 

1 - G - P.odi D.9. Sequences 

ss 

[0150] Isolated polynucleotides of the invention can include coding sequences that encode polypeptides comprising 
an amino acid sequence encoded by sequences in REF AND SEQ TABLES 1 AND 2 or an ammo acid sequence 
presented in REF AND SEQ TABLES 1 AND 2. 
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[0151] A nucleotide sequence encodes a polypeptide if a ceil (or a cell free/rs vitro system) expressing that nucleotide 
sequence produces 3 polypeptide having the recited amino acid sequence when the nucleotide sequence is transcribed 
and the primary tfansenpt is subsequently processed and translated by a host cell (or a cell freem vitro system) har- 
boring the nucleic acid Thus, an isolated nucleic, acid that encodes a particular amino acid sequence can be a genomic 
s sequence comprising exons and introns or a cDNA sequence- that re-presents the product of splicing thereof An isolated 
nucleic acid encoding an amino acid sequence also encompasses heteronuolear RNA, which contains sequences that 
are spliced out during expression, and mRNA. which lacks, those sequences. 

[0152] Coding sequences can be constructed using chemical synthesis techniques or by isolating coding sequences 
or by modifying such synthesized or isolated coding sequences as described above 
■to [0153] In addition to coding sequences encoding the polypeptide sequences of REF AND SEQ TABLES 1 AND 2. 
which are native to corn, Arsibidopsis. soybean, rice, wheat, and other plants the isolated polynucleotides can be 
polynucleotides that encode variants, fragments, and fusions of those native proteins Such polypeptides are described 
below in part II 

[0154] In variant polynucleotides generally, the number of substitutions, deletions or insertions is preferably less than 
»5 20%, more preferably less than 1 5%; even more preferably less than 1 0%, 5%, 3% or 1 % of the number of nucleotides 
comprising a particularly exemplified sequence, it is generally expected that non-degenerate nucleotide sequence 
changes that result in 1 to 10, more preferably 1 to 5 and most preferably 1 to 3 amino acid insertions, deletions or 
substitutions will not greatly affect the function of an encoded polypeptide. The most preferred embodiments are those 
wherein 1 to 20. preferably 1 to 10. most preferably i to 5 nucleotides are added to. deleted from and/or substituted 
20 in the sequences specifically disclosed in REF AND SEQ TABLES 1 AND 2. 

[0155] Insertions or deletions in polynucleotides intended to be used for encoding a polypeptide preferably preserve 
the reading frame This consideration is not so important in instances when the polynucleotide is intended to be used 
as a hybridization probe. 

25 II. Polypeptides and Proteins 

1 1 A. Native polypeptides and proteins 

[0156] Polypeptides within the scope of the invention include both native proteins as well as variants, fragments, 
30 and fusions thereof Polypeptides of the invention are those encoded by any of the six reading frames of sequences 
shown in REF AND SEQ TABLES 1 AND 2, preferably encoded by the three frames reading in the 5' to 3' direction of 
the sequences as shown. 

[0157] Native polypeptides include the proteins encoded by the sequences shown in REF AND SEQ TABLES 1 AND 
2. Such native polypeptides include those encoded by allelic variants. 

35 [0158] Polypeptide and protein variants will exhibit at least 75% sequence identity to those native polypeptides of 
REF AND SEQ TABLES 1 AND 2. More preferably, the polypeptide variants will exhibit at least 85% sequence identity; 
even more preferably, at least 90% sequence identity: more preferably at least 95%, 96%, 97%. 98%. or 99% sequence 
identity Fragments of polypeptide or fragments of polypeptides will exhibit similar percentages of sequence identity to 
the relevant fragments of the native polypeptide Fusions will exhibit a similar percentage of sequence identity in that 

40 fragment of the fusion represented by the variant of the native peptide. 

[0159] Furthermore, polypeptide variants will exhibit at least one of the functional properties of the native protein 
Such properties include, without limitation, protein interaction, DNA interaction, biological activity, immunological ac- 
tivity, receptor binding, signal transduction, transcription activity, growth factor activity, secondary structure, three-di- 
mensional structure, etc As to properties related to in vitro or in vivo activities, the variants preferably exhibit at least 

45 60% of the activity of the native protein; more preferably at least 70%. even more preferably at least 80%, 85%. 90% 
or 95% of at least one activity of the native protein. 

[0160] One type of variant of native polypeptides comprises amino acid substitutions, deletions and/or insertions. 
Conservative substitutions are preferred to maintain the function or activity of the polypeptide 
[0161] Within the scope of percentage of sequence identity described above, a polypeptide of the invention may 
so have additional individual amino acids or ammo acid sequences inserted into the polypeptide in the middle thereof 
and/or at the N-ferminal and/or C-terminal ends thereof Likewise, some of the amino acids or amino acid sequences 
may be deleted from the polypeptide. 

A.1 Antibodies 

ss 

[0162] Isolated polypeptides can be utilized to produce antibodies Polypeptides of the invention can generally be 
used, for example, as antigens for raising antibodies by known techniques. The resulting antibodies are useful as 
reagents for determining the distribution of the antigen protein within the tissues of a plant ot within a cell of a plant. 
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The antibodies are also useful for examining the production level of proteins in various tissues, for example in a wi id- 
type plant or following genetic manipulation of a plant, by methods such as Western blotting. 

[0163] Antibodies of the present invention, both polyclonal and monoclonal, may be prepared by conventional meth- 
ods In general, the polypeptides ot the invention are first used to immunize a suitable animal such as a mouse, rat 

s rabbit, or goat. Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of seaim 
obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as Freund's complete 
adjuvant, and injecting the mixture or emulsion parenteral^ {generally subcutaneously or intramuscularly). A dose of 
50-200 ug/injection is typically sufficient, immunization is generally boosted 2-6 weeks later with one or more injections 

10 of the protein in saline, preferably using Freund's incomplete adjuvant. One may alternatively generate antibodies by 
in vitro immunization using methods known in the art, which for the purposes of this invention is considered equivalent 
to in vsvo immunization. 

[0164] Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 
the biood at 25 ,; C for one hour, followed by incubating the blood at 4' 5 C tor 2-18 hours The serum is recovered by 

»5 centrifugation (e.g., t.OOGxg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 

[0165] Monoclonal antibodies are prepared using the method of Kohler and Milstein Nature 256' 496 (1975), or 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the 
animal to extract serum, the spleen {and optionally several iarge lymph nodes} is removed and dissociated into single 
cells. If desired, the spleen celis can be screened (after removal of nonspecific-ally adherent cells) by applying a cell 

so suspension to a plate, or well, coated with the protein antigen. B-celis producing membrane-bound immunoglobulin 
specific for the antigen bind to the plate, and are not rinsed away with the rest of the suspension. Resulting B-ceiis, or 
all dissociated spieen cells, are then induced to fuse with myeloma celis to form hybndomas, and are cultured in a 
selective medium {e.g., hypoxanthine. aminopterm, thymidine medium. HAT"). The resulting hybndomas are plated by 
limiting dilution, and are assayed for the production of antibodies which bind specifically to the immunizing antigen 

25 (and which do not bind to unrelated antigens). The selected Mab-seereting hybndomas are then cultured either//? vitro 
{e.g., in tissue culture bottles or hollow fiber reactors), arm vivo (as ascites in mice) 

[0166] Other methods for sustaining antibody-producing B-cell clones, such as by F.BV transformation, are known. 
[0167] If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques 
Suitable iabeis include fluorophores. chrornophores, radioactive atoms ( particularly ;i -P and 1?5 D, electron-dense re- 
30 agents, enzymes, and ligands having specific binding partners Enzymes are typically detected by their activity For 
example, horseradish peroxidase is usually detected by its ability to convert 3,3'.5,5'-tetramethylbenzidine (TNB)to a 
blue pigment, quantifiable with a spectrophotometer. 

35 

[0168] Some polypeptides of the invention will have enzymatic activities that are useful//? vitro For example, the 
soybean trypsin inhibitor (Kunitz) family is one of the numerous families of proteinase inhibitors. It comprises plant 
proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, thiol protein- 
ases and aspartrc proteinases Thus, these peptides find in vitro use in protein purification protocols and perhaps in 

40 therapeutic settings requiring topical application of protease inhibitors 

[0169] Derta-arrnnolevuiinic acid dehydratase (EC 4 2 1 24) (ALAD) catalyzes the second step in the biosynthesis 
of heme, the condensation of two molecules of S-aminolevu!inate to form porphobilinogen and is also involved in chlo- 
rophyll biosynthesis(Kaczor et al. (1994) Plant Physiol, 1-4: 1411-7; Smith (1988) Biochem, J, 249: 423-8; Schneider 
(1976) 7... naturforsch. [C[ 31 56-83) Thus, ALAD proteins can be used as catalysts in synthesis of heme derivatives. 

■*s Enzymes of biosynthetie pathways generally can be used as catalysts form vitro synthesis of the compounds repre- 
senting products of the pathway. 

[0170] Polypeptides encoded by SDFs of the invention can be engineered to provide purification reagents to identify 
and purify additional polypeptides that: bind to them. This allows one to identify proteins that function as multimers or 
elucidate signal transduction or metabolic pathways In the case of DNA binding proteins, the polypeptide can be used 
so m a similar manner to identify the DNA determinants of specific binding (S Pierrouetal.. Anal. Biochem. 229:99(1995), 
S. Chusacuftanachai et a!., J. Biol. Chem. 274:23591 (1999), Q. Lin et a!., J. Bio! Chem. 272:27274 (1997)) 

II. B . POLYPEPTIDE VARIANTS , FRAGMENTS, AND FUSIONS 

55 [0171] Generally, variants , fragments, or fusions of the polypeptides encoded by the maximum length sequence 
(MLS) can exhibit at least one of the activities of the identified domains and/or related polypeptides described in Sections 
(C) and (D) of RF.F TABLES 1 and 2 corresponding to the MLS of interest. 
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II. 6.(1} Variants 

[0172] A type of variant of the- native polypeptides comprises amino acid substitutions Conservative substitutions, 
described above (see II. ), are preferred to maintain the function or activity of the polypeptide Such substitutions include 

s conservation of charge, polarity, hydrophobicity. size, etc For example-, one or more amino acid residues within the 
sequence can be substituted with another amino acid of similar polarity that acts as a functional equivalent; for example 
providing a hydrogen bond in an enzymatic catalysts. Substitutes for an amino acid within an exemplified sequence 
are preferably made among the members of the class to which the amino acid belongs. For example, the nonpolar 
(hydrophobic) amino acids include alanine, leucine, isoieucine, valine, proline, phenylalanine, tryptophan and methio- 

10 nine The polar neutral ammo acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and giutamme. 
The positively charged (basic) amino acids include arginine, lysine and histidine. The negatively charged (acidic) amino 
acids include aspartic acid and glutamic acid. 

[0173] Within the scope of percentage of sequence identity described above, a polypeptide of the invention may 
have additional individual amino acids or amino acid sequences inserted into the polypeptide in the middle thereof 
*s and/or at the N-termina! and/or C-termina! ends thereof. Likewise, some of the amino acids or amino acid sequences 
may be deleted from the polypeptide. Amino acid substitutions may also be made in the sequences; conservative 
substitutions being preferred. 

[0174] One preferred ciass of variants are those that comprise f 1 ) the domain of an encoded polypeptide and/or (2) 
residues conserved between the encoded polypeptide and related polypeptides Fortius class of variants, the encoded 
so polypeptide sequence is changed by insertion, deletion, oi substitution at positions flanking the domain and/or con- 
served residues. 

[0175] Another class of variants includes those that comprise an encoded polypeptide sequence that is changed in 
the domain or conserved residues by a conservative substitution. 

[0176] Yet another class of variants includes those that lack one of the;V? vitro activities, or structural features of the 
25 encoded polypeptides. One example is polypeptides or proteins produced from genes comprising dominant negative 
mutations. Such a variant may comprise an encoded polypeptide sequence with non-conservative changes in a par- 
ticular domain or group of conserved residues. 

ll.A.(2) FRAGMENTS 

[0177] Fragments of particular interest are those that comprise a domain identified for a polypeptide encoded by an 
MLS of the instant invention and variants thereof. Also, fragments that comprise at least one region of residues con- 
served between an MLS encoded polypeptide and its related polypeptides aie of great interest. Fragments are some- 
times useful as polypeptides corresponding to genes comprising dominant negative mutations are 

35 

!1A(3)PUS!GNS 

[0178] Ot intei-^st are < himw^s compnsino: (ha fragment of th<- MLS encoded polypeptide oi variants thei-^of of 
interest and \2) a tiagment of a polypeptide comprising the same domain For example an AP2 heitv encoded b) a 
40 MLS of the invention fused to second AP2 helix from ANT protein, which comprises two AP2 helices. The present 
invention also encompasses fusions of MLS encoded polypeptides variants or fragments thereof fused with i elate d 
proteins or fragments thereof, 

DEFINITION OF DOMAINS 

[0179] The polypeptides of the invention may possess identity ing domains as shown in REF TABLES 1 and ^Specific 
domains within the MLS encoded polvpeptides are indicated b\ the iefeience RtF- TAfrLEb 1 and .? In addition the 
domains within the MLS encoded f. olyj ef. tide can t e de fined t > the teqion irtat inhibits : it l^ast 70% se quence tde ntitv 
with the consensus sequences listed in the detailed description below of each of tne domains 
■io [0180] The majont\ ot the piotein domain descriptions given below aie oL twined from Piosite 
lhttpMrww.expasy.ch/prosite/). and Pfam. 
( http//pf a m. wu st I . ed w b rows e, s h tm I). 

i. (AAA) AAA-protein family signature 

ss 

[0181] A large family of ATPases has been descubed [1 to 5} whose key featuie is that they share a conserved region 
of about ammo acids that contains anATP-binding site This tamilv is now ealied AAA. foi 'ATPases 'Associated 
with diverse cellular 'Activities The pro! ems that belong to this family either contain one oi two AAA domains Proteins 
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containing two AAA domains: 

Mammalian and diosophila NSF (N-ethylmaleirnide-sensitive fusion pfot^in ) and the fungal homoiog, SEC 18 
These proteins are involved in intracellular transport between the endoplasmic reticulum and Golgi. as well as 
s between different Golgi cisternae. 

Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involved in the 
transfer of membranes from the endoplasmic reticulum to the golgi apparatus. This protein forms a ring-shaped 
homooligomer composed of six subunits The yeast homoiog is CDC48 and it may play a role in spindle pole 
proliferation. 

10 - Yeast protein PAS1, essential for peroxisome assembly and the related protein PAS1 from Piohia pastous. 

- Yeast protein AFG2. 

Sulfolobus acidocaldanus protein SAV and Halobacterium salmarium cdcH which may be part of a transduction 
pathway connecting light to cell division, 

*s [01823 Proteins containing a single AAA. domain: 

Escherichia coli and other bacteria ftsH (or hflBt protein f-'tsH is an AT P- dependent zinc, metallopeptidase that 
seems to degrade the heat-shock sigma-32 factor 

so [0183] It is an integral membrane protein with a large cytoplasmic C-terminai domain that contain both the AAA and 
the protease domains. 

Yeast protein YME1, a protein important for maintaining the integrity of the mitochondrial compartment. YME1 is 
also a zinc-dependent protease. 
2& - Yeast protein AFG3 (or YTA1 0). This protein also seems to contain a AAA domain followed by a zinc-dependent 
protease domain. 

[0184] Subunits from the regulatory complex of the 26S proteasorne [6] which is involved in the ATP-dependent 
degradation of ubiquitinated proteins: 

a) Mammalian subunit 4 and homologs in other higher euharyotes. in yeast (gene YTA5) and fission yeast (gene 
mts2). 

b} Mammalian subunit 6 (TBP7) and homologs in other higher eukaryotes and m yeast (gene YTA2). 
c) Mammalian subunit 7 {MSS1 1 and homologs in other higher eukaryotes and in yeast (gene CIM5 or YTA3). 
35 dj Mammalian subunit 8 (P45) and homologs m other higher eukaryotes and in yeast iSUG1 or CIM3 or TBY 1 ) 

and fission yeast (gene letH 

[0185] Other probable subunits such as human TBP1 which seems to influences HIV gene expression by interacting 
with the virus tat transactivator protein and yeast YTA1 and YTA6. 

Yeast protein BCS1. a mitochondrial protein essential tor the expression of the Rieske iron-sulfur protein 
Yeast protein MSP1. a protein involved in intramitochondnai sorting of proteins 

Yeast protein PAS8. and the corresponding proteins PASS from Pichia pastons and PAY4 from Yarrowia lipolytica 
Mouse protein SKD1 and its fission yeast homoiog (8pAC2G11 06} 
45 - Caenorhabditis elegans meiotic spindle formation protein mei-1 

- Yeast protein SAP1. 
Yeast protein YTA7. 

Mycobacterium leprae hypothetical protein A2126A. 

so [0186] It is proposed that, in general, the AAA domains in these proteins act as ATP-dependent protein clamps [5] 
In addition to the ATP-binding 'A' and 8' motifs, which are located in the N-ietminal half of this domain, there is a highly 
conserved region located in the central part of the domain which was used to develop a signature pattern. 
Consensus pattern: [LIVIV!T3-x-EL!VMTHliVMF3-x-[GATMCHSTHNSH4HLiVM]-D-x-A-[LiFA3-x-R 

ss [1] Froehlich K.-U.. Fries H.W.. Ruediger M. ( Erdmann R., Bctstein D.. Mecke D, J. Cell Biol, 114:443-453(1991). 

[2] Erdmann FL Wiebel F.F.. Ffessau A., Rytka J., Beyer A., Froehlich K.-U., Kunau W.-H. Ceii.84:499-510(1.99.1), 
[3] Peters J.-M., Walsh M.J., Franks WW. EMBO J. 9:1757-1767(1990). 

[4] Kunau W-H., Beyer A . Goette K , Marmoch M . Saidowsky J . Skalefz-Rorowski A.. Wiebel FF. Btochtrnte 75 
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209-224(1993). 

[5] Gonfalonieri R. Duguet M. BioEssays 1 7:639-650(1 995).[ 6] Hilt W, Wolf D.H. Trends Biochem. Set. 21:96-102 
(19961 

s 2 ABC Membrane t ABC transporter transmembrane region; This family represents a unit of six transmomorano 
helices Many memters of the ABC transporter familv (ABC ttanihave tvw such tegions oee also descnptions of 
A8<"_Tran huluv\ ana ABC2 muinbume abo,'<= 

3. (ABC Iran) 

ABC transporters family signature 

[0187] On the basts, of sequence similarities, a family of related ATP- binding proteins, has been characterized [1 to 5], 
These proteins are associated with avanety of distinct biological processes in both proharyotes and eukaryotes. but a 

»5 majority of them are involved in active transport of small hydrophtlic molecules across the cytoplasmic membrane. Ail 
these proteins share a conserved domain of some two hundred amino acid residues, which includes an ATP -binding 
site 'These proteins are collectively known as. ABC transporters. Proteins, known to belong to this family are listed 
below (references are only provided for recently determined sequences) in prokaryotes. - Active transport systems 
components; alkylphosphonate uptake(phnC/phnK/ phnl): arabmose (araG); arginme (artP): dtpeptide (dciAD;dppD/ 

so dppF): ferric enferohaetin (fepC): ferrichrome (fhuC): gaiactoside (rnglA). glutarnme UilnQ}; glyeefol-3-phosphate i ug- 
pC); glycine betaine/L-pro!ine{proV); glutamate/aspatate t'gttL); histidtne(hisP); iron(lll}(sfuC), iron(l!l)dicitrate (fecE); 
lactose (iacK): leucine/isoleucine/vaiine (braF/braG:iivF/livG); maltose (ma!K); molybdenum (modC); nickel (nikD/ 
nikE): oligopeptide {amiE/amtF;oppD/oppF); peptide (sapD/sapF): phosphate (pstB); putrescine (potG); ribose (rbsA); 
spermidine/putrescine (pctA); sulfate (cysA); vitamin B12 (btuO). - Hemolysin/leukotoxin export proteins hlyB, cyaB 

25 and IktB - Colicin V export protein cvaB. - Lactococcin export protein IcnC [6] - lantibiotic transport proteins nisT 
(nisin) and spaT (subtiiin). - Extracellular proteases B and C export protein prtD - Alkaline protease secretion protein 
apiD. - &eta-(1,2i-glucan export pioteins ohv.A and ndv.A. - Haemophilus influenzae capsule-poiysacoharide e:<porl 
protein bexA - Cytochrome c biogenesis proteins ccrnA (also known as cycV and heiA). - Poiysiahc. acid transport 
protein kpsT - Cell division associated ftsE protein (function unknown}. ~ Copper processing protein nosF from Pseu- 
ds domonas stutzen. - Nodulation protein nodi from Rhizobium (function unknown). - Escherichia coll proteins cydC and 
cydD. - Subunit A of the ABC excision nuclease (gene uvrA) - Erythromycin resistance protein from Staphylococcus 
epidermidis (gene msrA). -Tylosin resistance protein from Streptomyces fradiae (genetlrC) [7]. - Heterocyst differen- 
tiation protein (gene hetA) from Anabaena PCC 7120. - Protein P29 from Mycoplasma hyorhinis, a probable component 
of a high affinity transport system. ■ yhhG. a putative protein whose gene is linked with ntrA in many bacteria such as 

35 Escherichia coif Klebsiella pneumoniae. Pseudomonas putida, Rhizobium meiiloti and Thiobac.illus ferrowidans. - 
Escherichia cols and related bacteria hypothetical proteins yabJ. yadG, yagC, ybbA. ycjW. yddA. yen/, yejf-'. yheS. 
yhiG. yhiH. yjcVV, yjjK. yojl, yrbF and ytfR.ln eukaryotes: - The multidrug transporters (Mdr) (P-glycoprotein). a family 
of closely related proteins which extrude a wide variety of drugs out of the cell (for a review see [8]). - Cystic fibrosis 
transmembrane conductance regulator (CFTR j, which is most probably involved in the transport of chloride ions. - 

40 Antigen peptide transporters 1 (TAP1, PSF1 , RIMG4, HAM-1, mtpl ) and 2 (TAP2, PSF2 f R1NG11 , HAM-2, mtp2), which 
are involved in the transport of antigens from the cytoplasm to a membrane-bound compartment for association with 
MHC class I molecules. - 70 Kd peroxisomal membrane protein (PMP70), - ALDP. a peroxisomal protein involved in 
>'-iinkedadrenoleukodystrophy[9j. -Sulfonylurea receptor (10], a putative subunit of the B-cell ATP-sensitive potassium 
channel. - Drosophila proteins white (w) and brown (bw). which are involved in the import of ommatidtum screening 

45 pigments - Fungal elongation factor 3 i EF-3) - Yeast STE& which is responsible for the export of the a-factor pherorn- 
one - Yeast mitochondrial transporter ATM1 - Yeast MDLi and MDL2 - Yeast SNQ2. - Yeast sporidesmin resistance 
protein (gene PDR5 or STS1 or YDR1) ■■ Fission yeast heavy metal tolerance protein hmtl. This protein is probably 
involved in the transport of metal-bound phytoehelatins - Fission yeast breteldm A resistance protein (gene bfrl or 
hba2). - Fission yeast leptomycin B resistance protein (gene pmdl). - mbpX, a hypothetical chioropiast protein from 

so Liverwort - Prestaik-specific protein tag& from slime mold. This protein consists of two domains a N-terminal subttlase 
catalytic domain and a C -terminal ABC transporter domain. As a signature pattern tor this, class of proteins, a conserved 
region which is located between the 'A 1 and the 'B' motifs of the ATP-bmding site was used. 

[01883 Consensus pattern: [LlVMFYCHSAHSAPGLVFYKQH|-G-(DENQMW]--EKRQASPCLIMFWHKRNQSTAVMp 
[KRACLVM]-[LIVMFYPANHPHY}-[L!VMFV\'l [SAGCUVPHFYWHPHKRHPHlJVMf-YWSTAj Tne ATP-binding re- 
55 gion is duplicated m araG : mdL msrA, rbsA, tirC. uvrA, yejR Mdr J s. CFTR, pmdl and in EF-3 in some of those proteins, 
the above pattern only detect one of the two copies of the domain The proteins belonging to this family also contain 
one or two copies of the ATP-binding motifs 'A' and B' 
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E '3 Htggms C F Hyde S C Mimmack M M Giieadi U Gtil D R Gaiiagner MP J bioenery biomembt 22 
571-592(1990). 

[ 2] Hiqgins u F Gallaghet M P Mimnwk M M Peme -> R BioEssays 8 11 1-11 6( 1988) 
[ "ij Higgms C F Hiles ! D Sulm?nd G PC Gtfl D R Dcnvme t A Evans I t Holland f B Gr,jy' L eu:kels S 
s D Bel! A W Hermodson M A Nature i22 448 450t198c1 

f 4] Dot little R f : , Johnson M c5 Hmain! , .tn Huiten B Thomas, DC S.tncar A Natuie 5'! 3 45 1 -46^(1 986 1 
[5] Blight MA, Holland 1.8. Mol. Microbiol. 4:873-880(19901 

[CJ Stoddard G W Petrel J P vanBelkumMJ KokJ McKay 1 1 Appi Environ MicrcDiol 5$ 1952-1 i jG1<i9021 
[7] Rosteck PR Jr., Reynolds PA., Hershberger C.l. Gene 102:27-32(1981). 
10 [Bj Goftesman M M P-istan I J Biol Chem 28o 121M- !2166 t ,1988) 

[ O] V ille D Gartner J Nature ^61 o82-<^ ( 1Q93) 

[10] Agui!at-Biyan L Nichols <.-■ Vvechsiei S V, Clement J P IV Boyd A E III Gon^lez G Henera-So:>a H 
Nguy K , Bi wn J Nelson D A Science "t>8 423-42o(1 9^1 

*5 4. (ACBP) 

Acvl-CoA-binding protein signature 

[0189] Acyl-CoA binding protein lACBPt is a small tin «d) piotem that binds medium- and lonct-oham acyl-CoA 
so esters with vi : 'ty high atfmit) and may function a 1 , an intKKellu!af carrier ot acyl-CoA est^fs [1] ACBF is also known 
a? diazepam binoing mhibitoi fDBh oi endozepine (EP; because ot its ability to displace diazepam tiomthe oenzodi- 
a^epme tB2D1 tecogmtion site located on the GABA type A teceptor It is therefote possible that this protein also acts 
as a neuropeptide k modulate the action of the GAB A ie(,eptoi [2] Ac BP is a highly conserved piotein ot about 90 
iesidu> j s that has be.*n so fat found in vertebrates insects and veast ACBP is also related to the N-termmal taction 
25 of a probable transmembrane protein of unknown function which has been found in mammals. As a signature pattern, 
the region that coi responds to lestdue? 1P to 37 in mammalian aCBP y,as selected 
Consensus pattern r-[v>Tf A ]-^-[DEf-l]->-fL!VMF]-x(2i-fl"!VMFV]-Y-[G , ?TA]- , <-[F i']-K-Q-[5TA|(2>-<-G- 

[ijRoseTM SchuteER TodatoGJ Proc Natl Acad Sci USA 8U 1 123"-11291« IOuO 
30 osia h Guidotti A Life be i 49 32 t >-o44 t ,19Q1} 

5, (AIRS) 

AIR synthase related proteins 

35 

[0190] This family includes Hydrogen expression/formation protein HypE, AIR synthases, FGAM synthase and se- 
lenide, water dikinase. 

6, (AMP-binding) 

Putative AMP-binding domain signature 

[0191] It has been shown 1 1 to 5] that a number of prokaryotic and eukaryotic enzymes which all probably act via an 
ATP- dependent covalent binding of AMP to their substrate, share a region of sequence similarity. These enzymes are - 

45 - insects luciferase (luciferin 4-rnonooxygenase) Luciferase produces light by catalyzing the oxidation of lucifehn in 
presence of ATP and molecular oxygen ~ Alpha-aminoadipate reductase from yeast (gene LYS2) This enzyme cata- 
lyzes the activation of alpha-aminoadipate by ATP-dependent adenylation and the reduction of activated alpha-ami- 
noadipate by NADPi-t -Acetate— CoA Iigase lacetyi-CoA synthetase), an enzyme that catalyzes the formation of acetyl- 
Co.A from acetate and CoA ~ Long-cham-fatty-acid-CoA iigase, an enzyme that activates long-chain fatty acids for 

so both the synthesis of cellular lipids and their degradation via beta-oxidation.-4-eoumarate-CoA Iigase (4CL). a plant 
enzyme that catalyze?, the formation of 4-coumarate-CoA from 4 -coumarate and coenzyme A; the branchpoint reactions, 
between genera! phenylpropanoid metabolism and pathways leading to various specific end products - O-succinyl- 
benzoic acid— CoA Iigase (OSB-CoA synthetase) (gene menE) [6], a bacterial enzyme involved in the biosynthesis of 
menaqumone (vitamin K2). - 4-Chlorobenzoate-Co.A Iigase (EC 6.2.1.-) (4-CBA-CoA iigase) [7], a Pseudomonas 

55 enzyme involved in the degradation of 4-C8A. - indoleacetate-lysine Iigase (lAA-iysine synthetase) [8], an enzyme 
from Pseudomonas syrmgae that converts indoleacetate to lAA-iysine. - Bile acid-CoA Iigase (gene baiBi from Eubac- 
teritim strain VP! 12708 [4]. This enzyme catalyzes the ATP-dependent formation of a variety of C-24 bile actd-CoA.- 
Crotonobetaine/camitine-CoA Iigase (EC 6 3 2 -) from Escherichia colt (gene eaiC) - L-ialpha-aminoadtpyli-L-cystei- 
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n\l-D-\'alme synthetase (ACV synthetase ) from various fungt (gene ac\'A or pcbA£> This en^me catalyzes the titst 
ste|. in the biosynthesis. <.f penicillin and cephalos.ponn the toimation ot ACV from the constituent amino acids The 
amine acids, wm to be motivated by adenylalion It is a piotein of aic und 3700 amino acid 1 , that contains tlwe r^ fc3fed 
Domains, of about 1000 amino at ids. - Gtami:tdm S syntnetase 1 1 gene orsA' ttom Ba:tllus brwis This enzvme cata- 
s lyzes tnc first step in tnc biosynthesis of the cyclic antibiotic gramicidin S tnc ATP dependent racemtzation of pheny- 
laLtnme '! woudine sv nthekise I tgene tvcA; from FjJ.jciIIus oie^ is 'I he teaaion earned t ut by ty>.A is identical to that 
catalysed by grsA- Gramicidin S synthetase it (gene grsB) from Bacillus brevis. This enzyme is a multifunctional protein 
that actuates and polymeries proline valine ornithine and leucine GrsB consists of four related domains - Entero- 
bactin synthetase components E (gene entE t and F (gene entF) from Escherichia coll These two enzymes are involved 
to in thf* ATP-d^pc neurit aeration of respectively z "Vdihydroybtinzoafe and s^rin^ during enter obac tin (eriUKM h^lim 
biosynthesis - CyUie peptide antibiotic surfaetin s.yrithas.t* subumts 1 2 and 3 from Bacillus subiilis. Subunits, 1 and 2 
contains three related domains while subunrt 3 only contains a single domain. - HC-toxm synthetase (gene HTS1 5 from 
Cochliocolus. carbonum This enzyme activates the foui .jmino acids. iPio L Ala D-Ala and 2 amino- 5» !0-epo>i-8- 
oyodecanoic-jjid) that make ur. HC-tovin a cyclic tetraper. tide HTS I consists of four telated nomams Thete ate also 
*s some proteins, whose exact function is not yet known, butwhicbare. very probably, also AMP-bindmg enzymes. These 
proteins are: - ORA (octapeptide-repeat antigen), a Plasmodium falciparum protein whose function is not known but 
*\hkh s.txws.a high degree of similatrty *iththe .jbove ctotems ■ AngF? a Vihno anguillaiuin piotein AngF? is thought 
to be a ttanscnptional activator which modulates the anguibactm (an iron-binding sidetophote) biosynthesis gene clus- 
ter opeion But it is believed [9], that anctR is not a Di\l ft, -binding piotein but rather an enzyme involved in the bios.y ri- 
se thtiSis. of anquibadm This conclusion is bas.^d ori thr^e f : iofs the prwrue of the AMP-bindmg domain the sr::^ of 
angR {1048 ies.nu^s) which isfai bigger than any bu^tenal ttanscnptional piotein, am the ptesence <'f a ctobaole S- 
acjl thtoesterasfc immediatehj downstieam of angR - A hypothetical protein in mmsE 3'region in Pseudomonas aei- 
ugmosa. - Escherichia coli hypothetical protein ydiO. - Yeast hypothetical protein "YBR041W. - Yeast hypothetical protein 
YBR222c - \>Jii hypothetical protein YER14""c All thes-* proteins contain a highly <■ onse -tv-Kf region vtr, rich in gK- 
25 cine, serine, and threonine which is followed by a conserved lysine. A parallel can be drawn between this type of 
domain and the G v(4)-G K-(S'I } ATP-'C'I P- binding 'P-loop' domain or the piotein kinases G-\ -G-xt2)-joC|-\n'' 20)- 
KATP-btndmg domains. 

[0192] Consensus patten [LIVMR ]-/.(2HSTG]-[STAGj-G-[STHSTE!H'iG]-A-[PA'iLIVM]-[KR3 in a majority of ras- 
es the residue that follows the Lys at the end of the pattern is a Gly 

[IjTohH PtoteinS^ Data Anal 4 111-11" 1091) 

[23 Smith O.J., Ear! A.J.. Turner G. BMBG J. 9:2743-2750(1990). 

[ i] Schroeder J Nucleic Acids, Res r 4G0-4f;>0i1980} 

[4] M.jilonee O H Adams. I L Hylemon PB ! fSjotenol 1 7 4 1 06iS- 207 1(1^92) 
35 [ 5] Turqay K. . Krause M.. Marahiel MA Mol. Microbiol. 6:529-548(19921 

[6j Oriscoli J.R.. TaberH.W. J. Bacterial. 174:5063-5071(1992), 

[7] Babbitt P»^ Keri>cn GL M=itiri B M .Rarest H ->ylveshe M Scholten JD Ch=mg K -hi Liang P-H 

Dun^o^y-Manano O Biochemists 31 5594-5604(1 9<>2i 

[ 83 Farreii D.H.. Mikesell P.. Actis LA., Crosa J.H. Gene 88:45-51(1990). 

AP2 domain 

[0193] 'I his 6'' amino acid residue domain can bind to DNA j ij 'I his. domain is plant specific Members, of this famtl> 
.ire suggested to be telated to pyndoxal phosph jfe-c mdino dotn.jins stKh as found m aminottan_.? (5] AP" domains. 
45 an-^ ilso desenb^d in .lofuku al copending US Patent applications 08 "00 152 08/3~9 £27 0fe/«i2 2~2 
09/026.039. 

[1] OhiriH-takagi M Shms.hi H Plunt Cell 1995 " 173-132 
[2j Wetgei D Plant Cell 19t?5 7 323-389 
so [33 Mushegian AR koonm EV Genetics. I0£o 144 S1 7-8^8 

8. ARID 

[0194] Tru* ARID domain is, an AT-Rioh IriUKKtion domain sharing stiuctui : il homology to DfiAie plication and repan 
£5 nucleases and polymerases. 

[13 Hen sober RF h apian MH 1 *!sc Dl Das C S<.h<rueimann P Ttkkei PW Genes Hev IW 9 306""-308? 
[23 Yuan Whitson RH Liu Q Hak'ia K Chen Y Nat Stiuof Biol 1098 S 95<-J-9^4 
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ATP synthase gamma subunit signature 

[0195j ATP synthase {proton-translocating ATPase! (EC 3 6 1 34) is a componentof the cytoplasmic membrane 
of eubacteria, the inner membrane of mitochondria, and the thylahoid membrane of chloroplast.s. The ATPase complex 
is composed of an oligornenc transmembrane sector, called CF(G} : and a catalytic, core, called coupling factor CF(1). 
The former acts as a proton channel; the latter is composed of five subunits, alpha, beta, gamma, delta and epsilon. 
Subunit gamma is believed to be important in regulating ATPase activity and the flow of protons through the CF(0) 
complex The best conserved region of the gamma subunit [3] ts its C-terminus which seems to be essential for as- 
sembly and catalysts. As a signature pattern to detect ATPase gamma subunits. a14 residue conserved segment where 
the last ammo acid is found one to three residues from the C-terminai extremity was used. 

[0196] Consensus pattern [IV]--T--:<4-^i2)4t>E3->{3)-G-A-x -[SAKR}- Note Pea chloroplast gamma and two Bacillus 
species gamma subunits are not detected by this motif. 

[ 1) Futai M., Noumi T„ Maeda M. Annu. Rev. Biochem. 58:111-138(19891. 
[2] Senior A. E Physiol Rev. 88:177-231(1988} 

[ 3j Miki J , Maeda M , Mukohata Y.. Futai M FEBS Lett. 232 22 1-226(1 988}. 
10. (ATP Synt A} 
Synthase a subunit signature 

[0197] ATP synthase ( proton-translocating ATPasei (EC. 3 6.1 34} [ 1 2] is a component of the cytoplasmic membrane 
of eubactena. the inner membrane of mitochondria, and the thyiakoid membrane of chloropiasts. The ATPase complex 
is composed of an oligomers transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic 
core termed coupling factor CF< v 1 i.TheCFiOl a subunit also called protein 6. is a Mey component of the proton channel, 
it may pia;, a direct fole in translocating protons across the membrane it is a highly hydrophobic protein that has been 
predicted to contain 8 transmembrane regions [3] Sequence companson of a subunits from aii available souices repeals 
very few conserved regions The best conserved legion ts located in what is predicted to be the fifth transmembrane 
domain This legion contains three perfectly conserved residues an argmine. a leucine and an aspaiagine Mutagen- 
esis experiments of ATPase activity. This region was selected as a signature pattern. 

Consensus pattern: [STAGN]-x-[STAG]-[LIVMF]-R-L-x-[SAGV3-N-[L!VMT] [R is important for proton translocation] 

[ 1] Futai M., Noumi T., Maeda M. Annu. Rev. Biochem. 58: 11 1-1 36(1 989). 
[ 2} Senior A.E-! Physiol Rev 68 1 7 7.031 <t 988 ) 

[ 3] Lewis M.L, Chang J.A., Simoni R.D. J. Bioi. Chem. 285:10541-10550(1990). 
[4j Cain B.D., Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 



[0198] Part of the CF{0) (base unit) of the ATP synthase The base unit is thought to translocate protons through 
membrane (inner membrane in mitochondria, thyiakoid membrane in plants, cytoplasmic membrane in bacteria) The 
B subunits are thought to interact with the stalk of the CF(1 ) subunits. 



ATP synthase c subunit signature 

[0199] ATP synthase (piok n-tran^kxating ATPat.* 1 } [ I is a _x mponent of the c^topla^mt^ inembi ane ot eubactena 
the innei meinbiane of mitorfiondnj and the thyiakoid membrane of c hlotor. lasts, The ATPase comrJe< ts composed 
of an oiigomenc tiansmembiane sector culled CF(0) *hi:h acts as a proton channel and a rataivtir roie, termed 
coupling factor CFtD.The CF{0) c subunit saiso called protein 9. proteohpid, or subunit ill) [3.4jis a highly hydrophobic 
piofetn ot about 8 Kd which has been implicated in the pickn-cc nduofinq .KtK'tty of ATFase Stiui.tu! : iiiy subunit c 
consist of two long terminal hydrophobic regions, which probabiv span the membrane, and a central hydrophilic region. 
N N'-dicyclohe A ylcarDOdiimide (DCCDjcan bind cogently to subunit c and thereoy aboitshthe ATPase activity DCCO 
bindstoasp^tfieyiutamateu asp^rtat* 1 residue v, huh i*. located inth*- middle ofthe second heliophobic i^gion near 
the u-Unnmus ofthe protein A signature pattern vvhn.fi includes the DuCD-bindmg lesidue v\as dtiwed 
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[0200] Consensus pattern. [GSTA3-R-[F-fQJ-P-x(10HLIVMFYW]{2/-x<3}-[LIVMFyW}-x-[DE] [D or E binds DCCD] 

[ 1] Futai M . Nourni T, Maeda M Annu Rev Biochern 58' 111-136(1989). 
[2] Senior A. E. Physiol. Rev. 88:177-231(1968). 
s [ 3j Ivaschenko A T . Karpenyuk T.A., Ponomarenko S.V. Biokhimiia 56:406-419(1 991). 

[4] Recipe n H . Perasso R . Adcutte A , Quetier F J Moi. tvvol. 34.292-303(1992). 

13. (ATP synt DE) 

« ATP synthase, Delta/Epsilon chain 

[0201] Part of the ATP synthase CFf 1 } These subunits ate part of the head unit of the ATP synthase. The subunits 
are called delta and epsilon in human and met.ozoan species but m bacterial species the delta (D) subunit is theequiv- 
aient to the Olivomycin sensitive subunit (OSCP) in metozoans, 

14. (ATP syntab) 

ATP synthase alpha and beta subunits signature 

so [0202] ATP synthase (proton-translocating ATPase) [1 ,2] is a component of the cytoplasmic membrane of eubacteria, 
the inner membrane of mitochondria, and the thyiakoid membrane of chioroplasts The ATPase complex is composed 
of an oiigornenc transmembrane sector, called CF(0), and a catalytic core, called coupling factor CF<t) The former 
acts, as a ptoton channel, the latter is composed of five subunits, alpha, beta, gamma, delta and epsilon. The sequences 
of subunits alpha and beta are related and both contain a nucleotide-binding site for ATP and ADP The beta chain has 

25 catalytic activity, while the alpha chain is a regulatory subunit. Vacuolar ATPases [3] (V-ATPases) are responsible for 
acidifying a variety of intracellular compartments in eukaryotic ceils Like F'-ATPases. they are oligomeric complexes 
of a transmembrane and a catalytic sector. The sequenceof the largest subunit of the catalytic sector (70 Kd) is related 
to that ofF-ATPase beta subunit white a 60 Kd subunit from the same sector, is related to the F-ATPases alpha subunit 
[4] Archaebacteriai membrane-associated ATPases are composed of three subunits The alpha chain is related to F- 

30 ATPases beta chain and the beta chain is related to P -ATPases alpha chain [4] A protein highly similar to F-ATPase 
beta subunits is found [5] in some bacterial apparatus involved in a specialized protein export pathway that proceeds 
without signal peptide cleavage. This protein is known as fli! in Bacillus and Salmonella, Spa47 (mxiS) in Shigella 
flexnen. HtpB6 in Kanthomonas campestns and yscN in Yersinia virulence plasmids To detect these ATPase subunits. 
a segment often amino-acid residues, containing two conserved serines, as a signature pattern was selected The 

35 first serine seems to be important for catalysis - in the ATPase alpha chain at least - as its mutagenesis causes catalytic 
impairment. 

[0203] Consensus pattern: P-[SAP]-[LIV]-fpNH]-x(3}-S-x-S [The first S is a putative active site residue] 

[ 1j Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
40 [23 Senior A. E. Physiol. Rev. 68:177-231(1988). 

[ 3] Nelson N J. Bioenerg Biomembr 21 553-57 in 989) 

[43 Gogarten J.P., Kibak H., Dittrich P.. Taiz L, Bowman E.J., Bowman B.J,, Manolson M.F., Poole R.J., Date T., 
Oshtma T., Konishi J., Denda K. roshida M Proc. Natl. Acad Set. U.S.A. 86'6661-665< 1989). 
[ 5] Dreyfus G., Williams A.W., Kawagishi I., MacNab RM. J. Bacterid. 175:3131-3138(1993). 

45 

15. (ATPsynt abCj 

ATP synthase ab C terminal. 

so [0204] Number of members: 190 

31] Abrahams JR Leslie AG, Lutter R, Walker JE; Structure at 2.8 A resolution of F1 -ATPase from bovine heart mito- 
chondria." Nature 1994,370.621-528 

16. (A deaminase) 

ss 

Adenosine and AMP deaminase signature 

[0205] Adenosine deaminase catalyzes the hydrolytic dearnination ofadenosme into mosine AMP deaminase cat- 
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aiyzes the hydrolytic dea mi nation of AMP into IMP. It has been shown [1 ] that these two types of enzymes share three 
lemons of s^out-nce snnilanti<;b the^e legion*, aft- centered on i^du^s which aic piopost-'d to play an important lole 
tn tht; o : itai>fiemeLh?mismofthe^etvvOi : 'nZ)ini : 'S Oneof these legions o<ntatnirio.tVvOconst-fvi : 'daspartK acid lesidu^ 
that are potentiai active site residues was selected. 
s Consensus pattern [SAJ [LIVM] [NGSH^TA] OOP [Tho two D's arc putative active site residues] 
(1 [ Ch.jng. Z N^gaaid P Chmault A C kelleins R (■ Boihemistr/ 50 2273 .^80(19^1 ) 

17. (Acetyitransf! 

10 Acety (transferase (CiNAT) family. 

[0206] This famd/ contains piotem^ with N-acetyitian^ferase functions 
(1j hJeuv, jl-j A!-"- Landman D Tienos Boihem Su W 22 1 64-1 J* 

»5 18 (Aconitase C) 

Aconitase family signature 

[0207] Aconitase (aconttate hydiataset (["*_ 4 1 ^ i [ 1 1 is the enrvme from the tncarbo*yltc acid cycle that ratal>res 
J- tht itvtii,ib!« iscirK'tization of citrate and isc :iff : iti - ' uis aconttate is formed as an tnft-imedictfv pioduct o'unng the 
c?utse the ie.j;tion in ettkorvotes tw? isozymes of aconitase are known to e^ist one found in tne iTtitoth^ndnai 
matiu ana the other found in the cytoolasm Aconitase in its actue form contains a 4Fe~4S iron-sulfur cinstei three 
cysteine residues have been shown to be iigands of the 4Fe-4b cluster.it has been shown that the aconitase family 
alt.ii contains th>^ fo!!iiv\mgpiort-iris - iron-rosponsivo olt-ment binding protein (JRE-BPi IRE-BP is a o\toso!ic j.toit-in 
25 that binds to iron-responsive elements ( IREsi IREs are stem-loop structures found in the 5 UTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs. and in the 3'UTR of transferrin receptor inRMA. IRE-BP also express aconitase 
cKtK'tt\ - 3-isoptc p\ inflate fchydtata^tr (FC 4 2 I 33} nsopk'p\ inflate isomer ase-t thsr- en.-vm*- (hat <.atalv. - es (he 
s>x,ond sit-p in the hios^rithosisof Ititcinu - 1 toinoaconitass; (EC 4 2 1 10) ihonio?<cunitato hvdratas-^ ;<n onzynm that 
participates in the alpha-arninoadtpate pathway of Ksine biosynthesis and tnat concerts cis-hornuaconitate into ho- 
JO mot^oeitric acid tsh^fichif coll ciott-in ybhJ As a signature foi ciott-ins from tho aeonitabO family two xn^orvt-d 
legions that rontam the thr^e ;vstMne Iigands ot th^ 4Fe-4Sclustei weie selected 

Consensus pattern: [UVM]-xt2HGQAC!VTV!]-x-[UV3^ |C binds the 

iron-sulfur center] 

Consensus patte n G <£HU vWPQ]-xi 3 [GAC]-C-[GSTAMHLIMPTA]-C-[LiMV )-[GAj jThe two C's b nd the non-cut- 
55 fur center] 

j 1 1 Gruer M J Artymtuk Ki Guest J k frends biochein Sot 'V „ J 6(199 ? i 

19. (Acyl-CoA dh) 

40 Acyl-CoA dehydrogenases signatures 

[0208] Acyl-CoA dehydrogenases [1.2.3] are enzymes that catalyze the alpha, beta-dehydrogenation of acyl-CoA 
esters and transfer electrons to F.TF. the electron transfer protein Acyl-CoA dehydrogenases are FAD fiavoprotems 
This family currently includes. ■■ f-'ive eukaryctic. isozymes that catalyze the first step of the oeta-oxidation cycles for 

45 fatty acids with various chain lengths. These are short (SCAD) (EC 1.3.99.2), medium (MCAO) {EC 1.3.99.3), long 
(LCAD) (EC 1.3.99.13), very-long (VLCAD) and short/branched (SBCAD) chain acyl-CoA dehydrogenases. These 
enzymes are located in the mitochondrion. They are all homotetramenc proteins of about 400 amino acid residues 
eycept VLCAD which is a dimer and which contains, in its mature form, about 600 residues - Giutaryl-CoA dehydro- 
genase (EC 1. 3 .99.7) (GCDH} t which is involved in the catabolism of lysine, hydroxylysine and tryptophan - Isovaleryl- 

so CoA dehydrogenase (EC 1 3 99 10) (1VD). involved in the catabolism of leucine. - Acyl-coA dehydrogenases acsA and 
mmgC from Bacillus subtilis. - Butyryl-CoA dehydrogenase (EC 1.3,99.2) from Clostridium acetobutyticum, - Es- 
cherichia coil protein caiA [4] - Escherichia coli protein aidB Two conserved regions were selected as signature pat- 
terns. The first is located in the center of these enzymes, the second in the C-terminai section. 
Consensus pattern [GACHLIVM]-[5T]-E-x(2HGSAN]-G-[ST]-D->c(2i-[GSA] 

ss Consensus pattern: [QDE3-x(2)-G-j:GS].x-G-[L!VMFY3.x{2HDEN3-x(4HKR]-x{3HDEN3 

[ 1] Tanaka K. : (keda. Matsubara v., Hyman D.B. Enzyme 38 91-107(1987) 

[2] Matsubara Y, Indo Y. Naito E., Ozasa H., Glassberg R , Vockley J , ikeda Y, Kraus J , Tanaka K. J. Biol. 
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Chem. 284:18321-18331(1989). 

[ 3j Aoyama T., Ueno I., Kamijo T., Hashimoto T. J. Biol. Chem. 269:19088-19094(1994). 

[4] Eichler K. ( Bourgis F. : BuchetA., Kleber H.-P., Mandrand-Bertheloi M.-A. Mol. Microbiol. 13:775-788(1994). 

s 20. (Acy! transf) 

Acy! transferase domain 

[0209] Number of members: 161 
■to [■}} Serre L, Verbree EC, Dauter Z, Stuitje AR. Derewenda ZS: M€--dhne' 95288570 The Escherichia coll rnalonyl-CoA 
acy! carrier protein transacylase at 1.5-A resolution Crystal structure; of a fatty acid synthase component." J Bio! Chem 
1995:270:12961-12964, 

21 . Acylphosphatase signatures 

[0210] Acylphosphatase (EC 3.6 1 /') [1,2] catalyzes the hydrolysis of various acy! phosphate carboxyl-phosphate 
bond?. such as oarbamy! phosphate, succ.inylphosphate, 1.3-diphosphogiycerate, etc. The physiological role ot this 
enzymeis not yet clear Acylphosphatase is a small protein of around 100 amino-acid residues There are two known 
isozymes One seems to be specific to muscular tissues, the other called 'organ-common type', is found tn many 

so different tissues. While acylphosphatase have- been so far only characterized in vertebrates. there are a number of 
bacterial and archebacterial hypothetical proteins that are highly similar to that enzyme and that probably possess the 
same activity. These proteins are: - Escherichia coii hypothetical protein yccX. - Bacillus subtilis hypothetical protein 
yfll - Archaeogiobus fulgidus hypothetical protein AF0518. Two conserved regions were selected as signature pat- 
terns The first is located in the N-iermina! section, while the second is found in the central part ofthe protein sequence 

25 Consensus pattern: [LIV]-x-G-x-V-Q-G-V-)«-[FM]-R 

Consensus pattern: G-[FYWHAVCHKRQAM]-N-x(3)-G-x-V"X(5)-G 

[ 1] Stefani M. ( Ramponi G. Life Chem. Rep. 12:271-301(1995). 

[2] Stefani M., Taddei N.. Ramponi G. Cell. Mol. Life Sci. 53:141-151(1997). 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures 

35 [02 1 1] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such asreceptor mediated endocytosis. 
In addition to ciathnn, the CCV are composed of a number of other components including oligomers complexes which 
aie knownas adaptor or ciathnn assembly proteins (AP) complexes [1] The adaptor complexes are believed to interact 
with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. In mammals two type of 
adaptor complexes are known' AP~1 which is associated with the Golgi complex and AP-2 which is associated with 

40 the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptins - 
(gamma and beta' in AP-1, alpha and beta in AP-2): a medium chain fAP47 in AP-1: AP50 inAP-2t and a small chain 
(AP19 in AP-1; AP17 in AP-2). The medium chains of AP-1 and AP-2 are evolutionary related proteins of about 50 Kd. 
Homologs of AP47 and AP50 have also been found in Caenorhabditis elegans (genes unc-101 and ap50t [2] and yeast 
{gene APM1 or YAP54) [3j.Some more divergent, but clearly evolutionary related proteins have also been found in 

■*s yeast APM2 and YBR288e . Two conserved regions were selected as signature patterns, one located in the N-termmai 
region, the other from the central section of these proteins. 

Consensus pattern: [IVTHGSP]"W-R-x(2,3HGAD]-x(2)"[HY3-x{2)-N-x- [LiVMAFY3(3)-D-[LiVMj-[LiVMT]-E 
Consensus pattern [UV]-x-F-l-P-P-x-G-x-[LIVMFY]-x-L->c(2)-Y 

so [ 1] Pearse B.M. Robinson M.S. Annu. Rev. Cell Biol. 6:151-171(1990). 

[ 2] Lee J . Jongeward G D., Sternberg P.W Genes Dev 8.60-73! 1994). 

[ 3] Nakayama Y, Goebl M , O'Bnne G.B.. Lemmon S.. Pmgchang C E . Kirchhausen T Eur. J. Biochem 202 
569-574(1991). 



34 



EP 1 033 405 A2 



23. (Adenylsucc synt) 
Adenylosuccinate synthetase signatures 

s [021 St] Adenylosuccinate synthetase- t'EC 6 3 4 4; pj plays an important role in punnebiosynthesis. by catalyzing the 
GTP-dependent conversion of IMP and asparttc. acid to AMP Adenylosuccinate synthetase has been characterised 
from various sources ranging from Escherichia coii (gene pwA) to vertebrate tissues Invertebrates, two isozymes are 
present - one involved m purine biosynthesis and the other in the purine nucleotide cycle. Two conserved regions were 
selected as. signature patterns.. The first one is a perfectly conserved octapeptide located in the N -terminal section and 

to which is involved in GTP-binding [2] The second one includes a lysine residue known [2] to be essential for the enzyme's 
activity. 

Consensus pattern: Q-W-G-D-E-G-K-G 

Consensus pattern G-I-[GR]-P-x-Y-k;2)-K->h>>R [K is, the active site residue] 

75 [ 1 j Wlesmuelier L, Wittbrodt J.. Noegei A.A., Schleicher M. J. Biol. Chem 286:2480-2485(1991 ). 

[ 2] Silva M.M., Poland B.W.. Hoffman C.R., Fromm H.J., Honzatko R.B. J. Moi. Biol. 254:431-446(1995). 
[ 3] Bouyoub A.. BarbierG, Foiterre P., !. abed an 3. 2.3X:_0.2 V ;;J. .}M.3}A^1.^±1^^§)- 

24. (AdoHcyase) 

S-adenosy!-L-homocysteine hydrolase signatures 

[0213] S-adenosyl-L-homocysteine hydrolase (EC 3 3 1. 1 } (AdoHcyase) is an enzyme of the activated methyl cycle, 
responsible for the reversible hydration of S-adenosyl-L-homocysteine into adenosine and homocysteine AdoHc- 
25 yase is anubiquitous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein 
(1 j of about 430 to 470 ammo acids. Two highly conserved regions were selected as signature patterns. The first pattern 
is located in the N-termmal section; the second is derived from aglyome-nch region m the central part of AdoHcyase; 
a region thought to be involved in NAD~bindmg. 

Consensus pattern: [GSAHCS}-W-xKi^LMI^STHCWHDEN>x-EAVHAT>IADHACHLIVMCG3 
30 Consensus pattern: [GA]-(KS]->c('3)-[LIVi-x-G-(FY]-G->c-[VC3-G-[KRL3-G-)c-[ASC3 

[ 1] Sganga M W . Aksamit R R . Cantoni G L , Bauer C E Proc. Natl Acad Sci. U.S A. 69 6328-6332(1992) 

25. AhpC/TSA family 

35 [0214] This family contains proteins related to aii<yi hydroperoxide reductaseCornrnent: (AhpC) and thiol specific 
antioxidant (TSA). 

[1] Chae HZ. Robison K, Poole LB, Church G, Storz G. Rhee SG. Pioc Natl Acad Set USA 1994:91 7017-7021 

26. (Aldose epim) 

[0215] Aldose 1-epirnerase putative active site Aldose 1-epimerase (EC 5 13 3} i mutarotase) is the enzyme respon- 
sible for the anomeric interconversion of D -glucose and other aldoses between their alpha- and beta-forms. The se- 
quence of mutarotas-e from two bacteria. Acinetobacter calcoaceticus and Streptococcus, thermophilus is available (1 j. 
It has also been shown that, on the basis of extensive sequence similarities, a mutarotase domain seem to be present 
45 in the C-termina! half of the fungal GAL10 protein which encodes, in the N-termina! part, for UDP-glucose 4-epimerase 
The best conserved region in the sequence of mutarotase is centered around a conserved histidine residue which may 
be involved in the catalytic mechanism 
Consensus pattern; [NS]-x-T-N-H-x-Y-[FW]-N-[LI] 

f 1]Poolman B., RoyerT.J., MainzerS.E., Schmidt B.F. J. Bacteriol. 172:4037-4047(1990). 

27. (AlkA DNA repair) 

Alkylbase DNA glycosidases alkA family signature 

55 [0216] Alkylbase DNA glycosidases [1] aie DNA repair enzymes that hydrolyzes the deoxynbose N-g!ycosidic bond 
to excise various alkylated bases from a damaged DNA polymer. In Escherichia coli there are two alkylbase DNA 
glycosidases: one (gene tag )which is constitutively expressed and which is specific for the removal of 3-methylademne 
(EC 3.2.2.20), and one (gene aikAi which is induced during adaptation to alkylation and which can remove a variety 
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of alkylation products (EC c 2 2 21} Tag and alkA do not share any region of sequence bimiiantv' In yeast there is an 
alkylbasf CNA ylyco^idase (gene MAGH [2 3| v,hKh can rftncne 3-m^thyiadenme of 7-methvladenine and v,hKh is 
structurally related to alkA MAG : md alk>^ ate both proteins of about 300 ammo acid residues. VVhiie the C- arid N- 
twminai ends uppear to be unrelated theie iss a central legon ffubout f 30 tesnues which is wHI ronserved Ap^rti^n 
s of this region has been selected as a signature pattern 

Consensus pattern; G-l-G-x-W-fSTHAV}-x-[LIVMFY]f2V-x-4LlVMl-xf8V-[MF>xf2HED}-D 

[1] UndahlT.. Sedgwick B. Annu. Rev. Biocbem. 57:133-157(1988). 
[2jBerdalKG Bjoras U Bfelland S St- ebery £■ C £MRO i 9 46C 3-46f 8t WffOi 
« [ 3] Chen J.. Derfler B.. Samson L. EM BO J. 9:4569-4575(1990). 

28. Ammonium transporters signature 

[0217] A numoer piotems involved in the transport of ..immonmm ions aimss amemfctane as well as some vet 
»5 uncharacterized proteins have been shown [1.2] to be evolutionary related. These proteins are: - Yeast ammonium 
transporters MEP1 . MEP2 and MEP3. - Arabidopsis thaliana high affinity ammonium transporter (gene AtVITI ). - Co- 
rynebacienum glutamicum ammonium and methylammomum transport system. - Escherichia coll putative ammonium 
tiansportei amtB - Bacillus siibttlis nrgA - Mycobacterium tuoercuiosts Hypothetical ptotetn MtCY33b U^c - eyne- 
chocystis strain PCC 68G3 hypothetical proteins SIIQ108, SII0537 and 5111017 - Methanococcus jannasohii hypothetical 
20 proteins MJ0058 and MJ1343 - Cae nothabditis ekgaris hypothetical proteins UU5E11 4 F49E11 3 and M195 o ^s 
eypecteo by then transport function these proteins ar^ hiohly hydrophobic and seem to contain from 10 to 12 tr^n sa- 
me mbrartt domains The oest conserved region seems to be located in tne fifth tor smh> transmernbiane tegion and 
is used as a signature pattern. 

Consensus pattern: D-[FWS3-A-G^GSC]-x(2HtV3->f(3HSAGK2)-xf2HSAGHL!VMF]-x{3HL!VMFYW](2).)<-[GKfx. 

25 R 

1 1 j Ninnemann O Jar,niau> J -C Frommsr-i WB EM BO J 1 3 34f 4-^47 1( l«94 ) 

[2jSiwiiRM x V> J il B BurS-o\,ski A Eihnanns b J Eikm mm M b ramnf-t R J Bio! Chem 271 5^3o-54C3 
(1996). 

30 [ 33 Saier M.M. Jr. Adv. Microbiol. Physiol. 40:81-136(1998). 

29. (ArchJ-i istone) 
CBF/NF-Y sub-units signatures 

[0218] Diverse DMA binding proteins are known to omd the CCAAT dok a common cis-acting element found in the 
( romoter and enhancer legions of a large ruinibt;! of g^n^s in f»ukar>cUs Amongst ihw f. toxins it. one known Ah 
the CCAAT-binding factor (CBF) or NF-Y CBF is a heteromenc transcription factor that consists of two different 
components both needed for DtsiA-bmding The HAP protein comply of yeast binds to the upstream acti\ ation site of 

40 cytochrome C iso-1 gene (CYC1 ) as well as other genes involved m mitochondria! electron transport and activates 
thtiir r^ssiori It also recognises the sequence COAAf and is si ruck' tally and ^voiutionatv rehired to CBF i he fit it 
saubnnit <'f CBF kncMn as CBF-A 01 NF-YB in ^ertebiatess HAP3 m binding yeast ano as php3 in fission yeast is a 
protein of 116 to 21'' amino -acid residue*, which contain::, a highly conserved tential domain of about J-Orestdties 'Ihis 
domain ieems to te involved in DNA-binding a signature pattern had been devek ped fi 01 n ifc central part The second 

45 subunit of C BF, known as C BF-8 or NF-/A in vertebrates, HAP2 in budding yeast and php2 in fission yeast, is a protein 
of 265 to 350 ammo-acid residues which contains a highly conserved region of about 60 residues This region, called 
the 'essential core' [2]. seems to consist of two subdomains an N-termina! subunit-association domain and a C-terminal 
Df-JA recognition domain A signature pattern has been developed from a section of the subunit-assoctattori domain 
Consensus pattern. C-V-S-£-x-l-S-F-(L!VM]-T-[3Gj-E-A-[SCJ-(DE3-[KRO]-C- 

50 Consensus pattern. Y-V-N-A-K-Q-Y-x-R-l-L-K-R-R-x-A-R-A-K-L-E- 

[ 13 Li X.-Y., Mantovani R., Hooft van Huijsdutjnen R . Andre I., Senoist C. Math is D Nucleic Acids Res. 20; 
1087-1091(1992). 

[2] Olesen J.T., Fikes J.D . Guarente L. Mol Cell, Biol. 11:611-619(1991). 

ss 

30. Argininosuccinate synthase signatures 

[0219] Argininosuccinate synthase (EC 6.3.4.5) (AS) is a urea cycle enzyme that catalyzes the penultimate step in 
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atgtnme biosynthesis the ATP-dependent ligation of citrulitne to aspartate to fotm argmmosuccinate AMP anopyro- 
ptusihate [1 2j In humans a defect in th*> Ab gt-ne causes cittuthn*=>mic< a ^en^ti.: dt^^e chaiadenzed b\ stvwe 
vomiting spoilt, jnd mental it;t?3id : iticn I& a hcmotetifsm-iric Mtzyuv* of ehairib of : ibout 400 amine -acid residues 
^naroinine s^emis to oe important fm the enzvrne's catalytic mechanism The seqtten;**s of from canons piokary- 
s otcs archaeoactena and eukaryotes show significant similarity Two signature patterns have boon selected for AS 
The fntt is a hi-jhh con reived sttetch of nine residues located in the N tenninal exttemitv of these enzymes the second 
is d> j rked from 3 eons> j r^ed recuon which contains oris; of thu consurwd arginiru* msidii-^ 
Consensus pattern: [ASHFY]-S-G-G-[LV]-D-T-[ST]- 
Consensui- pattern G ON D R F- 

[ 1j van Vliet F.. Crabee! M.. Boven A.. Tricot: C. Stalon V.. Falmagne P.. Nakamura Y... Baumberg S.. Glansdorff 
N. Gene 95:99-1 04(1990). 

[2]M,>insC J.RwwJN J Bacteiiol 1 " 0 31^-31Mt W8i 

*s 31, Armadillo/beta-catemn-iike repeats 

[0220] Apprcx 40 .jmino .teid tepeat Tandem tepeats foim super help of hekes that is pioposed to medute mtei 
action of beta-catemn with its iigands. CAUTION: This family does not contain ail known armadillo repeats, 

20 [1] Huber AH Nelson WJ vx'eis Wt Cell W 90 871-8^2 

[2] Gumfcinet BM Cuir Opm Ceil Bio! IW" 634-640 

[3] Cava lb R Rubenstem D Peifei M Cnrr Opm Genet Dev 1 907 ~ 45^-406 

[4] Su Lk Vognlstem B hmclei KiV bcienc-n 10£3 2G^ 17C 4-1737 

[Sj M?<si ir; FR Miineirntsu ^ Pol ikii, P fVi^ncv 19«: 262 1^1-1734 
25 [ej Peifer M, Wieschaus E. Cell 1990;63:1167-1176. 

32. (Asn Synthase) 
Asparagine synthase 

[0221] Thtsa family is always fftinrf associated with GATaseJ2 Members of this family ^.italyse the ion vision of 
aspartate to asparagine. 

33. Asparaginase. 2 

35 

Asparaginase 12 members 

34. (Aspartyi tRNA N) 

40 Amtnoacyl-transfer RNA synthetases class-l! signatures 

[0222| Ammoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate ammo acids and transfer 
them to i-pecific tPNA module::. as the fust step in piotein biosvnthesis In prokan, oti^ oigaiusms there are at least 
fwenh, diffese nt types of ammoa<.\l tPN& synthetases one foi ea>.fi diffetent ammo and In eukaiyotes thete aie gen 

45 ivaily two aniinuaoyl-iREviA S',nthei?<st;s for each different ammo joid on- 1 cytusuit^ form md a mitochondrial form 
Wnile all these en.ymes have a common function tney are \waely di\erse in terms of suounit size and of quatematy 
structure. The synthetases specific for alanine, asparagine. aspartic acid, glycine, histidme, lysine, phenylalanine, 
ptohne serine ano thieonme are reterted to as class-ll syntnetases (2 to 6] ano f lohaoly ha^ e a common folding 
pattern in tneir catalytic domain for the otnding of ATP and amino acid which is different to the Rossmann fold observed 

so ki the dabs I b\ nthete*.* 1 *. [7] Class-ll tRNA synthetases do not shart- i high deyiet- ot similanK ho^e^ei l^^st 
thtee i-onperved legRni .tie piesent(2 5 8] Signature patterns h.ue teen oenveo tiomtwo of these legions 
Consensus pattern [F>'H]-P-y-[DE3-M'4 12V[RH]-yCi)-F-<{35-1DE1 

Consensus pattern: [GSTALVFHOENQHRKP}-(GSTAHLiVMF)-(DE3-R-{LIVMF}-x-(LIVMSTAG3-{L!VMFY] 

ss [1] SchimiTie! P. Annu. Rev. Biochern. 56: 125-158(19871 

[ 2j Deiarue W Moras D EioEssays 1C o75-6b7(1Ctoi 
[ 33 Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 43 Nagtl o M DoulHtl-i R F Proc Nail Acad Sci USA 88 8 12 1-8 !25( 199 1 ) 
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[ Sj Cusack a Haertlein M Lebetman R Nucleic Auos Pes 19 „-489-34°e(.l99l > 
[Cj cusack b Biochimie75 !0~-I08!( l^Sol 

[ 7] c usaok S Borthc t uc lemmas O Hafrflein M Nast^r (J Let or man R Nature .<47 249 255( 199tn 
[ Lev^equH F Plateau P Desswi F eian^uet S Nucleic Audi Fes 18 50^-312(19905 

s 

35 (ArfGapi Putath'? GTP ase activating protein for Aif Putative rin ; finger. *ith G f P^se activating proteins. inAPs, s 
tow arcs, the sm ill GTPasr Aff Thu GAP of ARP1 stimulate a GTPas>r> hydrolysis for ARD1 but not -*PFs Nuint ^ of 
members: 34 

10 [0223] 

[IjMediine 96324970 Identification and cloning of eentaunn-alpha A novel phosphatidylinositol 3.4.5-tnsphos- 
phate-binding piotein from rat brain Hammonds-Odie LP, Jackson TP Profit AA. Blader IJ, 'furck CW, Prestwich 
GD TheibertAB, J Biol Chem 1996.271 16859-18868 
»5 [2]Medline: 97208423. A target of phosphatidylinositol 3.4.S-tnsphospbate with a zinc finger motif similar to that 

of the ADP-nbosyiation -factor GTPase-activating protein and two pleckstrin homology domains, Tanaka K. Imajoh- 
Ohmi S, Sawada T Shirai R Hashimoto Y Iwasaki S, Kaihuchi K. Kanaho Y Shiiai T Tetada V. Kimura K. Nagata 
S Fukui V. Eur J Biochem 1997 245 512-519 

[3] 981 12795 Molecular characterization of the GTPase -activating domain of ADP-ribosylation factor domain pro- 
20 tain 1 t ,ARD1 1 Vitale N, Moss J Vaughan M, J Biol Chem 1998 273-2553-2560 

36 Apolipoprotem Apolipoprotem A 1/A4y£ family This family includes Swiss PQ2647 Apoiipoprotem A-l Swiss 
P06727 Apolipoprotem A-IV. Swiss: P02649 Apolipoprotem E. These proteins contain several 22 residue repeats which 
form a pair of alpha helices. Number of members- 42 

[0224] (1|Medline 91 2891 38 Three-dimensional structure of the LDL receptor-binding domain of human apolipo- 
protem E Wilson C, Warden MR. Weisgraber KH. Mahley RW. Agaid DA: Science 1991.252 1817-1822 

37. Amino acid permeases signature 

[0225] Amino acid permeases aie integral membrane proteins involved in the tiansport of ammo acids into the cell 
A number of such proteins have been found to be evolutionary related [1,2,33. These proteins are: - Yeast general 
ammo acid permeases (genes GAP 1 , A&P2 and AGP3i - Yeast basic amino acid peimease (gene ALP1 1 - /east 
Leu/Val/lle peimease i gene BAP2} ■ Yeast arginine peimease (gene C AND. ■• Yeast dioaibo<y!ic amino acid permease 

35 ^gene DIPS t - Yeast asparagme/glulamine permease (gene AGP 1 1 - Yeast gluiamtne permease t gene GNP1 5 - Yeast 
histidine permease (gene H1P1 }. - Yeast lysine permease tgene LYP1 j - Yeast proline permease (gene PUT4 ) - Yeast 
valine and tyrosine permease t_gene VAL1/TAT1 ) - Yeast tryptophan permease (gene TAT2/SCM2} - Yeast choline 
tiansport protein (gene HNM f'CTRI ) - Yeast GABA permease (gene UGA4) - /east hypothetical piotein YKL!74c 
- Fission yeast piotein isp5 - Fission yeast hypothetical protein SpAC8A4 11 - Fission yeast hypothetical protein 

40 SpAC11D3 08c - Emencella mdulans pi olme transport protein (gene prnB) - Tnchoderma harzianum ammo acid per- 
mease 1HDA1 - Salmonella typhimurium L-asparagine permease < gene ansPt - Escherichia coh aromatic amino acid 
transport piotein {gene aioP) - Escherichia colt D-senne/D-aianine/glycine tiansporter (gene cycA} - Escherichia coll 
GABA permease (gene gabP) - Escherichia coll lysine-specific permease (gene tysP). - Escherichia coli phenylalanine- 
specific, permease (gene pheP). ■ Salmonella typhimurium praline-specific permease (gene proY). ■ Escherichia coil 

45 and Klebsiella pneumoniae hypothetical protein yeeF. - Escherichia coli and Salmonella typhimurium hypothetical pro- 
tein yifK - Bacillus subtilis permeases rocC and rocE which probably transports arginine or ornithine These proteins 
seem to contain up to 12 transmembrane segments. As a signature for this family of proteins, the best conserved 
region which is located in the second transmembrane segment has been selected 
Consensus pattern: [STAGC]~G~[PAG^x(2,3HL!VMFYWAK2)-x-[LIVM^ 

50 [LIVMFYVVTj-x-[LIVMST}-x(3)-[LIVMCTAHGA]-E-x(5)-[PSAL}- 

[ 1] Weber E„ Chevalier M.R., Jund R. J. Mol. Evol. 27:341-350(1 988), 
[2] Vandenbol M„ Jauniaux J.-C, Grenson M. Gene 83:153-159(1989). 

[ 3] Reizer J„ Finley K., Kakuda D„ McLeod C.L., Reizer A,, Saier M,H, Jr. Protein Sci, 2:20-30(1993). 

ss 

38. aakmase (1) Glutamate 5-kinase signature 

[022&] Glutamate 5-kinase (EC 2.7.2.11 ) (gamma-glutamyl kinase) (GK) is the enzyme that catalyzes the first step 
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in the biosynthesis of proline from glutamate. the ATP-dependent phosphorylation of L-glutamate into L-giutamate 
5-phosphate In eubactena (gene ptoB) and yeast [1] (gene PR01) &K is a monofunctiona! protein, while in plants 
and mammals., rt is a btfunctionsl enzyme <P5CS) [2]that consists of two domains a N -terminal GK domain and a C- 
terminal gamma-glutamy! phosphate leductase domain iEC 1 2. 1.41) (see <PDOC00940>lAs a signatute pattern, a 
s highly conserved glycine-and alamne-nch region located in the central section of these enzymes has been selected. 
Yeast hypothetical protein YHR033w is highly similar to GK 

Consensus pattern: [GSTN].x(2VG-x-G.[GC3'-[!M3-x-j:aTA].K~[L!VM3.x.[SA3.[TCA3-x{2)-[GALV]-x(3VG- 

[ 1) Li W.. Brandnss M C J Bacterid 1 /4.41 48-41 56f 1992) 
10 [ 2} Hu C -A A , Delauney A J Vetma DPS Proc Natl Acad Set USA 89 9354-9358(1992) 

aakinase (2) Aspattokmase signature 

[0227] Aspartokinase iEC 2 7 2 :_4) <AK} [1 ] catalyzes the phosphorylation of aspartate. The pioduct of this reaction 
»5 can then be used in the biosynthesis of lysine or in the pathway leading to homosenne, which participates in the 
biosynthesis of threonine, tsoleucine and methionine. In Escherichia colt, there are three different isozymes which differ 
in their sensitivity to repression and inhibition by Lys, Met and Thr. A.K1 tgenethrA) and AK2 (gene metl) are Afunctional 
enzymes which both consist of an N- terminal AK domain and a C-terminal homoserine dehydrogenase domain. AM 
i? invoked in threonine biosynthesis and AK? in that of methionine The third iso7yme AK? (gene lysC i, is monofunc- 
so ttonal and involved in lysine synthesis In yeast, there is. a single isozyme- of AK (gene HOM3) As a signature pattern 
for AK a conserved region located in the N-termmal evttemity has been selected 
Consensus pattern. [LiVM]-y.-K-[FY3-G-G-[ST]-[SCj-{L!VM]- 
[ t] Rafalski J A , Falco & C J Biol Chem 263 2146-2151(1988). 

25 aakinase t'3) Gamma-glutamy! phosphate reductase signature 

[0228] Gamma-glutarnyl phosphate reductase (EC 1.2.1.41 > ^GPRI is the enzyme that catalyses the second step in 
the biosynthesis of proline ffom gtutam&te. the NADP-dependent reduction of L-glutamate 5-phosphate into L-gluta- 
mate 5-semiaidehyde and phosphate In eubacteria (gene proAj and yeast [1]^gene PR02). GPR is a monofunctiona! 
30 protein, while in plants and mammals, it is a Afunctional enzyme (P5CS) [2jthat consists of two domains: a N-ternnnai 
glutamate 5-kinase domamtEC 2 ?_2J_1) (see < PJ3OC00701>i and a C-tenninal GPR domain As a signature pattern, 
a conserved region that contains two histtdine residues has been selected. This region is located in the last third of GPR. 
Consensus pattern: V-x(5)-A-[LIV>x-H-l-x{2HHY]-[GSHST]-x-H-[STHDEj-x- 1- " 

35 [ 1 j Pearson B M . Hernando Y , Payne J Wolf S S . Kalogeropoulos A . Sc.hweizer M Yeast 1 2 1 02 1 -1 03 1 1 1 996) 

[ 2} Hu C - A A , Delauney A J , Verma 0 PS Proc Natl Acad Sot U S.A 93S4--9358(1S92). 

39 iabhydrolase) alpha'beta hydrolase fold This catalytic domain is found in a very wide lange of enzymes 

40 [0229] [1] Ollis DL Cheah E, Cygler M. Dijkstra B Frolow F. Franken SM, Hate! M. Remington 5J. Silman I. Schrag 
J. Sussman JL Vetsehueren KHG, Goldman A. Protein Eng 1992,? 197-211 

40 (Acid phosphati Histidine acid phosphatases signature? 

45 [0230] Acid phosphatases (EG 3 1.3.2) are a heterogeneous group of proteins that hydrolyze ptiosphate esters 
optimally at low pH. It has been shown [1] that a number of acid phosphatases, from both prokaryotes and eukaryotes, 
share two regions of sequence similarity, each centered around a conserved histidine residue These two htsttdines 
seem to be involved in the enzymes' catalytic mechanism [2,31,. The first histidine is located in the N-teimmal section 
and forms a phosphohistidine intermediate while the second is located in the C- terminal section and possibly acts as 

so proton donor. Enzymes belonging to this family are called 'histtdine acid phosphatases' and are listed below: 

Escherichia coli pH 2.5 acid phosphatase (gene appA). 
Escherichia coif glucose-1-phosphatase (EC 3.1.3.10) (gene agp). 
Yeast constitutive and repressihle acid phosphatases (genes PHG3 and PHOS) 
55 - Fission yeast acid phosphatase igene phol j 

Aspergillus phytases A and 8 (EC 3 i 3 8t (gene phyA and phyB) 
Mammalian lysosomal acid phosphatase. 
Mammalian prostatic acid phosphatase. 
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- Caenorhabditis eiegans hypothetical proteins 80361.7, C05C10.i ; C0SC10.4 and F26C11 1 

[0231] Consensus patt«fn[LIVM]->(2)-[LIVMA]-x(2!-[LIVMj-x-R-lH-[GN]-:<-R-:<-[PAS] [hi is the phosphohistidme resi- 
due] 

s Consensus pattern[LiVMF>x-[L!VMFAG)-x(2HSTAG!3-H-D-fSTANQ>-x-[LiVM]~xf2Hi-!VMFy3-x{2HSTA] [H is an ac- 
tive s=ite residue] Sequences known to belong to this, class detected by the patternALl. , except for rat prostatic acid 
phosphatase which seems to have Tyr instead of the active site His 

[ 1) van [--fieri R L . Davidson R.. Stevts P.E . MacArthur H„ Moore D.L. J. Biol Chem. 26623 13-231 9(1 991 i. 
10 [ 2} Ostamn K., Harms E H., Stevis P.E . Kucie! R . Zhou M.-M . van Etten R L J. Bio! Chem 267 22830-22836 

(1992), 

[ 3j Schneider G., Lindqvist Y, Vihko P. EM BO J. 12:2609-2615(1993). 

41. Aconitase family signatures 

[0232] Aconitase (aconitate hydrataset (EC 4 2 1 3) [1] is the enzyme from the tricarboxylic acid cycle that catalyzes 
the reversible isomeric ation of citrate and ts.oc.it rate Cis-aconitate is formed as an intermediary product during the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist one found in the mitochondrial 
matrix and the other found m the cytoplasm Aconitase. in its active form, contains a 4Fe-4S iron-suifur cluster: three 

so cysteine residues have been shown to be iigands of the 4Fe-4S cluster it has been shown that the aconitase family 
aiso contains the following proteins: - Iron-responsive element binding protein (IPE-8P) IRE-BP is a cytosolic protein 
that binds to iron-responsive elements (IREst. IREs are stem-loop structures found in the S'UTR of ferritin, and deita 
aminolevulinic acid synthase mRNAs, and in the 3'UTR of transferrin receptor mRNA. IRE-BP also express aconitase 
activity. - 3-isopropy!ma!ale dehydratase (EC 4 - 7 ! J 33 ) (isopropylmalate isomerase), the enzyme that catalyzes the 

25 second step in the biosynthesis of leucine. - Homoaconitase (EC 4.2. 1 .36) (bomoaconitate hydratase), an enzyme that 
participates, in the alpha -aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
moisocitric acid. - Esherichia coli protein ybhJ 

Consensus pattern: [LIVM3-x{2HGSACIVM]-x-[LTVHGT!V3-[STP3.C-x(0,1)-T.N-[GSTANI3-x(4HL!VMA] [C binds the 
iron-sulfur center] 

30 Consensus pattern: G-x(2HLi\AA/PQ3-x(3)-[GAC}-C-lGSl"AMHL!MP"rA]-C-[L!MV}-[ , GAj [The two C's bind the iron-sul- 
fur cents rj- 

[ 1] Gruer M.J.. Artymiuk P.J.. Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 

42. Actins signatures 

35 

[0233] Actins [ 1 to 4 j are highly conserved contractile proteins that are present in ail eukaryotic cells In vertebrates 
there are three groups of actin isoforms' alpha, beta and gamma The alpha actins are found in muscle tissues and 
3re a major constituent of the contractile apparatus. The beta and gamma actins co-exists in most ceil types 3s com- 
ponents of the cytoskeleton and as mediators of internal cell motility, in plants [5] there are many isoforms which are 

40 probably involved in a variety of functions such as cytoplasmic streaming, cell shape determination, tip growth, gravi- 
perceptton, ceil wail deposition, etc.. Actin exists eithenn a monomelic form (G-aetin) or in a polymerized form (F-acttn). 
Each actin monomer can bind a molecule of ATP; when polymerization occurs, the ATP is hydrolyzed. Actin is a protein 
of from 374 to 379 amino acid residues The structure of actin has been highly conserved in the course of evolution. 
Recently some divergent actin -like proteins, have been identified in several species. These proteins are. - Centractin 

■*s {actin-RPV jftom mammals, fungi (yeast ACTS, Neurospora crassa ro-4) and Pneumocystis carinn (acim-ii). Centractin 
seems to be a component of a rnuiti-subumt centrosornal complex involved in microtubule based vesicle motility. This 
subfamily is also known as ARP1. ■ ARP2 subfamily which includes chicken ACTL. yeast ACT2. Drosophila 14D. C. 
eiegans actC - ARP3 subfamily which includes actin 2 from mamrnais. Drosophiia 86B, yeast ACT4 and fission yeast 
act2. -ARP4 subfamily which includes yeast AC T3 and Drosophila 13E, Three signature patterns have been developed. 

so The first two are specific to actins and span positions 54 to 6<1 and 357 to 365 The last signature picks up both actins 
and the aotm-iike proteins and corresponds, to positions 108 to 118 in actins 
Consensus pattern: [FY]-(L!V]-G-[DE]-E-A-Q-x-[PKQ|f2)-G- 
Consensus pattern: W-{lV]-[STAHRK]-x-[DE3-Y.[DNEHDE]- 

Consensus pattern: [LM]-[L!VM]-T-E-[GAPQ3-x-[LIVMFYVVHQ3-N-[PSTAQ]-x{2)-N-[KR3- 

ss 

[ 1] Sheteriine P., Clayton J., Sparrow J. C. (In) Actins, 3rd Edition, Academic Press Ltd, London, (1996). 
[2] Pollard T.D., Cooper J.A. Annu. Rev. Biochem. 55:987-1036(1986). 
[ 3] Pollard T.D. Curr. Opin. Cell Biol. 1:33-40(1990). 



40 



EP 1 033 405 A2 



[4] Rubensteirt P.A. BioEssays 12:309-315(1990). 

[ 5] Meagher R.B., McLean B.G Cell Moti!. Cytoskeleton 16: 164-166(1 990). 
43. Adenylate kinase signature 

s 

[0234] Adenylate kinase (EC 2 ? 4 3} (AK) [1] is. a small monomeric enzyme that catalyzes the reversible transfer of 
MgATP to AMP (MgATP + AMP - MgADP + ADPj.ln mammal? there are three different isozymes - AK 1 (or myokinase). 
whfch is cytosohc. - AK2. which is located in the outer compartment of mitochondria. - AK3 (or GTP;AMP phospho- 
transferase), which is located in the mitochondrial matrix and which uses MgGTP instead of MgATP, The sequence of 

10 AK has also been obtained from different bacterial species and from plants and fungi Two othef enzymes have been 
found to be evolutionary elated to AK These are: - Yeast uridylate Kinase (EC 2 7 4 -) (UK) (gene: URA6) [2] which 
catalyzes the transfer of a phosphate group from ATP to UMP to form UDP and ADR - Slime moid UMP-CMP kinase 
.2.7.4.14 ? [3] which catalyzes the transfer of a phosphate group from ATP to ©Ether CMP or UMP to form CDP or 
UDP and ADP Several regions of AK family enzymes are well conserved, including the ATP-bmding domains. The 

'5 most conserved of all regions have been selected as a signature for this type of enzyme. This region includes an 
aspartie acid residue that is part of the catalytic cleft of the enzyme and that is involved in a salt bridge, it also includes 
an arginine residue whose modification leads to inactivation of the enzyme 
Consensus pattern: [LIVMFYW](3)-D-G-{FYI3-P-R-x(3HNO]- 

20 [ 1] Schub G E Cold Spring Haibof Symp Quant Biol. 52:429-439(1987). 

[2] Liljeiund R, Sanni A., Fnesen J.D., Lacroute F, Biochem, Biophys. Res. Commun. 165:484-473(1989). 
[ 3j Wiesmuelier L., Noegei A A.. Barzu O , Gensch G . Schleicher M. J. Biol Chem. 265 6339-6345(1990). 
[4] Kath T.H., Schmid R . SchaeferG Arch. Biochem. Biophys. 307 405-410(1993) 

2S [0235] 44- (adh_short) Short-chain dehydrogenases/reductases family signature. 

[0236] The short-chain dehydrogenases/reductases family ( SDP i j 1 j is a very large family of enzymes, most of which 
are known to he NAD- or NADP-dependent o:<idoreductases. As the first member of this family to be characterized 
was Drosophiia alcohol dehydrogenase, this family used to be called [2.3.4Jinsect-type' ; or 'short-chain' alcohol dehy- 
drogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently 

30 known to belong to this family ate listed below - Alcohol dehydrogenase CEO 1111} from insects such as Drosophiia 

- Acetoin dehydrogenase (EC 1 1 1 55 from Klebsiella terrtgena (gene budC) - D-beta-hydroxybutyrate dehydrogenase 
(BDH) (EC 11130) from mammals. - Acetoacefyl-CoA reductase (EC 11138) from various bacterial species (gene 
phbB or phaB}. - Glucose 1 -dehydrogenase (EC 1 1 1 47 > from Bacillus - 3-beta-hydto>:ysteroid dehydrogenase (EC 
1.1. 1.51} from Comomonas testosteroni ■■ 20-beta-hydrcxysteroid dehydrogenase (EC 1J.J..53) from Streptomyces 

35 hydrogenans. - Ribito! dehydrogenase iEC 1 1 1 56't (RDM) from Klebsiella aerogenes. - Estradiol 17-beia-dehydro- 
genase (EC 1.1.1.62) from human ■■ Gluconate 5-dehydrogenase (EC 1.11 69} from Giuconobacter oxydans (gene 
gno) - 3-o>oacyl-[acyl-carner protein] reductase (EC 1 1 1 100) from Escherichia colt (gene fabG) and from plants - 
Retinoi dehydrogenase (EC 111105) from mammals - 2-deoxy-d-glucon3te 3-dehydrogenase (EC 111.125} from 
Escherichia coli and Ervvmia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140) from 

40 Escherichia colt (gene gutD> and from Klebsiella pneumoniae (gene sorD). - 15-hydro^yprostaglandm dehydrogenase 
(MAD+) (EC 1 .1.1.141) from human. - Corticosteroid 11-beta-dehydragenase {EC 1.1-1-148) (11-DH) tram mammals. 

- ?-a Ipha-hydroxystero id dehydrogenase (EC 1...1...1...159} from Escherichia coli (gene hdhA), Eubacterium strain VP! 
12708 (gene bai.A} and from Clostridium sordeliii. ■ NADPH-dependent carbonvi reductase (EC 1.1.1.184) from mam- 
mals. ■ Tropinone reductase- 1 {EC i. i i LOG; and -11 i'EvC 111236) from plants ■■ N-acylmannosamine 1 -de hydroge- 
ns nase (EC 1|.|.233) from Flavobactenum strain 141-8. - D-arabinito! 2-dehydfogenase {ribulose forming) (EC 

1.1.1.250; from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252; from Magnaporthe grisea. - Ptendine re- 
ductase 1 (EC 1.1.1.253) (gene PTR1) from Leish mania. ■ 2,5--dichloro-2,5-cyclohexadiene-l,4-dtol dehydrogenase 
(EC 1.1.--} from Pseudomonas paucimobilis. - Cis-1 : 2-dihydroxy-3 : 4-c.yclohexadiene-1-carboxylate dehydrogenase 
{EC 1.3. i -} from Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xyiL). - Biphenyl-2.3-di- 

so hydro-2,3-dio! dehydrogenase (EC 1.3.1 -) (gene bphB) from various Pseudomonaceae - Cis-toluene dihydrodiol de- 
hydrogenase (EC 1.3 1 ■} from Pseudomonas putida (gene todD> ■• Cis-benzene glycol dehydrogenase (EC .1.3.1.19) 
from Pseudomonas putida (gene bnzE) - 2.3-dihydro-2 : 3-dihydroxybenzoate dehydrogenase {EC 1 3 1 28} from Es- 
cherichia coli (gene entA) and Bacillus subttlis (gene dhbA). - Dihydropteridine reductase {EC 1.6.99.7) (HDHPR) from 
mammals - Lignm degradation enzyme ItgD from Pseudomonas paucimobilis. - Agropme synthesis reductase from 

55 Agrobacieriurn plasmids (gene mas!) - Versicolors reductase from Aspergillus parasiticus (gene VER1). - Putative 
keto-acyl reductases from Streptomyces poiyketide biosynthesis operons - A Afunctional hydratase-dehydrogenase- 
epimerase from the peroxisomal beta-oxidation system of Candida tropicahs. This protein contains two tandemiy re- 
peated 'short-chain dehydrogenase-type' domain in its N-terrninai e<tremity - Modulation protein nodG from species 
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of Azospitilium and Rhizobium v\hich is probably involved in the mortification of tne nodulation Esbd factoi fatty acyt 
chain. - Nitrogen fixation protein tixR from Bradyrhizobium japonicum - Bacillus subtil is protein dltE which is involved 
in the biosynthesis of D- alanyl-ltpoteichote acid. - Human follicular vananf translocation protein 1 (FVT1). - Mouse 
odir. •jcv'te piotein p27 - Mouse piotein V e 6 - Maize sex aetei ruination protein TASSELSEED 2 - Sort or. haga peie- 

s gnna 25 Kd deveiooment soecitic protein - Drosopmla fat body protein PC A Listeria monocytogenes hypothetical 
piotein encoded in the interna tins, gene rvcjk n Evschetichia coli hvptthetictl piotein yak Esc bench is coli hypothet- 
ical protein vdfG - Esihtiiichid coli hypothetical protein vjgi - Escheriohu coli hypothetical protein vigU - Escherichia 
coli hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD - Bacillus subtilis hypothetical protein ywfD 
■ Bacillus subtilis hypothetical protein vwfH ■ Yeast hypothetical piotem Yll 124* Yeast hypothetical protein v IR036c 

10 - Yuahi hypothetical (fctem /!R036t - \e=ist hypothetical f. totem >'KL055e - Fission y-sast hypothetical protein 
Sp-*C23D3 11 One of the bust consc- twd regions vshnJi includes t^vo perfectly conserved fc- sidims a ruosine and a 
lysine has been selected as i signature pattern for this t^miK of protein*. The tyrosine if^idtiH participates m the 
catalytic mechanism. 

Consensus pattern (UV&P^DNK]-yi12)-Y-[PST^GNCVH&T-iGNOCIVM]-[3TAGCj-K- {PCHSAGF/RHLIVM- 
*5 STAGD)-x(2HUVMFYWj-xt'3)- (LlVMFYWGAPTHQHGSACQRHMj [Y is an active site residue] - 

[ 1] Joemvall H.. Persson B . Krook M.. Atnan S.. Gonzalez-Duarte R.. Jeffery J . Ghosh D. Biochemistry 34: 
6003-601 3(1 995). 

[2] Villarroya A. Juan E.. Egestad B. Joernvall H. Eur J, Biochem 180:191-197(1989), 
20 [3] Pt^sc n B Krook M Joernvall H Em J Biochem 200 537-543. 1 991 t 

[4] Neidle E L , Hartnett C Ornston N L , B.i och A , Pekik M Har,iy,i na S Eu J Biochem <*04 1 13-1/0(1992). 

[0237] -15 i adh_short_C2^ Short-cham deh^riK^enab^b/ieduct^SfeS fan nK bignatutfe 

The short-ehjin dehydrogenase*, r-Kiuoi ises f imtly (PDRi [1] is a vur^ lafcp f imtly of enzymes most of Mitch are. 

25 known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was 
Diosophila akohol dehydiogenase, tnti. family used to be called [2 3 4j'tnsect-t>pe' or 'short-chain' alcohol dehydro- 
genases Met inembsr-i of this family ate proteins of about 750 Ic 300 amino acid residues The proteins, currently 
knovi n to belong to this f initly ar> j listed btlo'V - Akohol detn drogc- nas« (EC 1 1 1 1 1 Hons insc- cts such as Dfosophila. 
-Aeetoin dehyarogenasetEC 1 1 1 5> from Klebsiella terngena igene budC) - D-freta-hyor:uyb> iterate dehydrogenase 

30 (BLil-h (EC 1 1 1 30) tiom uvimmals - Al etoac^tvl-CoA reductase (EC 1 1 1 361 from vane us bat tenal species (gene 
phfcBfM pnoB; -Glucose t-dehyomgenase (EC 1 1 1 4m fnm Bacillus - 3-beta-hydtoxysteroid d^hvurogenase (EC 
1.1.1.51 t from Comomonas testosteroni. - 20-beta-hydroxy steroid dehydrogenase (EC 1 1.1.53) from Streptomyces 
hydrogenans. - Ribito! dehydrogenase {EC 1 .1.1.56) (RDH) from Klebsiella aerogenes. - Estradiol 17-beta-dehydro- 
genase iEC 111 62) frun human ■ Gluconate 5-dehydtoaenrtse (t-X t 1 1 c'^> ftoin Gluconot rtctet o<ydrtns (gene 

35 gno) - i-o^o i^l-Jjcsl-eatner protein] teduaase (EC 1 1 1 1005 from Escherichia colt igene fa ho) and ftom plants. - 
Petinol Jehydioyenase (K C 1 1 1 1 0o) from mammals 2-deo A y d gluconate 3-dehydiogertase (EC l l i 125) from 
Esdienehia coli and Envinia (.hrysanthenii (gene kcfuD) -Sorbitol-b-phosphate 2-dehydtoC;enai J < : ' (EC 1 1 1 1 40 1 from 
Eschenchi3 coli ( gene gutDt 3nd ftom KleosielU pneumoniae fgene sorC i - !5-hydro>yprost^glanom dehvdiogensse 
(NAD+) (EC 1.1. 1.1 41) from human. - Corticosteroid 11-beta-dehydrogenase (EC •1.1.1.146) (11-DH) from mammals. 

40 - 7-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.159) from Escherichia coli (gene hdhA). Eubactenum strain VP! 
12/0fi t^ent; baiAt and tiom Clostndium sordellti - NADPH-dtf enoent i.arbon> I reductase (EC 1 1 1 U*4i ffcm mam- 
mals. - Tropmone reductase-! (EC 1.1.1.206! and -II (EC 1.1.1.2361 from plants. - N-acylmannosamine 1 -dehydroge- 
nase <£C 1 l l 233; from f-'layobactenum i-tiam 141-8 ■ L> aiabimtol 2 rteh>dro9enai.e tnbulose forming') (EC 
1 1 1 260s ftoin fnnyi ■ 'fetiahydto>>v naphthalene reductase tE:C 1 1 1 262i ftom f 1 laynaf orthe ^n^ea ■ Pyridine re- 

45 ouctjs- 1 1 (EC 1 1 1 253) igent; PTR1> from Leishtnam i - 2 S-dt^hloro-2 S-oyc!ohe A adiene-1 4-diol dehydtogenase 
(EC 1 1 --iforn Pseudom^nas paucnrtr-bilis - Cib-1 2-dih\dtoYy-3 4-cyciohe/aoiene-1-r-3ibovyiate den\dtogenase 
{£■€ 13 1 ) fiom Aometobac-tei oalooaoeticus tgene benD) and Pseudomonas putida tgene *yll ) Biphenyl-2.3-di- 
hvdto-2 3-diol dehyomgenase (EC 1 3 1 -) (gene bfhB) from ^atf?us Ps^udomonocea^ - Cis-toluene rtihyomdio! de- 
h\drogenase tEC 1 3 1 -) from Pseurtomonas outtda (gene todD) - Cis-bentene glycol dehydrogenase t EC 1 3 1 19i 

so from Pseudomonas putida (gene bnzE). - '^.3-dihydro-2.3-dihydroxybenzoate dehydrogenase {EC 1.3.1.28) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydioptetidme reductase \EC 1.6.99 7) (HDHPKifrom 
mammals, - Lignm degradation enzyme ItgD from Pseudomonas paucimobilis. - Agropme synthesis reductase from 
Agrobactenum plasmids (gene mast ). - versicolonn reductase from Aspergillus parasiticus (gene VER1 ). - Putative 
keto-acyl fcdu^'tases from Stieptom>ces polyketide btOi.yritti^i.is ot -irons - A Afunctional hydratai.f 1 de^^cfioqerict^f- 

ss epimera^e from th> j puro^isonul buta-oxid ition systmn of C mdida iropn.alis This protc in contains two t?<ndc mU fc - 
peated 'short-chain rteh\dtogenase-tv'pe domain in its r-Metrninai extremity - Nortulatton protein nodGfrom species 
of AzospinMum and Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty acyl 
chain. - Nitrogen fixation protein fixR from Bradyrhizobtum japonicum. - Bacillus subtilis protein dltE which is involved 
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in the biosynthesis of D~ alan\l~!ipoten,hoic acid - Human follicular yaiiant translocation orotein 1 tFVTn - Mouse 
adi|.ocite piot^in p2~ - Mou^e piotein Ke r - M?tze deteimmation piottm T^SSELSEEC 2 - baacnhaga peie- 
gnna 25 Kd devekpment spenfic protein - Drosophila tat body protein P6 - A Li&ttNa monocytoqeneb hvpcthetical 
protein en;n^ed in the intetnalins gene tegion - Eschen:hia c-Hi hypothetical pi?tetn vor- - Es;:hen<.hki roll nypothet- 
s teal protein ydfG Escherichia coll Hypothetical protein yjgl Escnenchia coli h\pothetical protein yjgU Escherichia 
cols hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD. 

- bacillus subtilis hvpoih^li^ i! ptuttin yv\ fH - W<st hypothetical protein > IL124\v - ><*ast hypothe tn.al piotein > !R03^c 

- Yeast hypothetical protein Y!R038c. - Yeast hypothetical protein YKL055c. - Fission yeast hypothetical protein 
SpAC >3D3 1 ! One or the best tonseived regions \Mnch includes two perfectly consewt-d residues, a tviosme and a 

to lysine has be^n used as a signature pattern for this, family ot protein 1 . Thetyio^int; lesidu^ paitk'ipates in ttu : ' catalytic 
mechanism. 

Consensus pattern: [LIVSPADNK]-Xi12)-Y-[PSTAGMCVHSTAGNQC1VMHSTAGC]- K- {PC}-[SAGFYR3-[LIVM- 
ST*\GDi->!tJi-[LIVMf : W]-Mji JUVMR WGAP"! HQ] [GSACQRHMj H is an jdi^ site residue] 

»5 [ 1] Joernvai! H.. Persson B.. Krook M.. Atnan S ; Gonzales-Duarte R.. Jeffery J., Ghosh D. Biochemistry 34; 

6003-6013(19951 

[ I ] Viilrtnova A JuanE- t-qestad [■? Joernvall H Eui J Biochem 1«s0 191- 1 97( 1989 1 
[3] Persson B Krook M Joem\allH Eut J Btochein 200 537-543(1991} 

[ 43 Neidle EL. HartnettC, Ornston N.L. Bairoch A.. RekikM , Harayama S. Eur, J. Biochem. 204:113-120(1992). 

[0238] 46 (adh_zmo Zin:-:ontaimng alcohH denydiogenases signatures Alcohol denydtogenus^ {EC 1111) 
lADH i catal^estne re\ ersible oxioation of ethanoi to acetaldehyde with tne concomitant reduction ot NAD [1] Cuirently 
thief stiikturally =ind tatalytic^lly different t\ pes of alcohol d^hydioy^n^sts =iie kfuwn - Imc-conteming 'long-chain' 
alcohol dehydrogenases - in^ect-typ^ ui 'short-chain' ;<icuho! de hvdrogt;nass;s - Iron-containing alcohol dehydtuou- 

25 nases Zinc-containing ADH s |*£.3] are dimenc or tetramenc enzymes that bind two atoms of zinc per subunit. One of 
the zinc atom is essentia! tor catalytic activity while the other is not. Both zinc atoms are coordinated by either cysteine 
(<j histidme lesidue^ the oa!a!> tic zinc is- uX'idmated bvfwo cysteines, and one hisMidirie Zi no-containing aDH'* aie 
found in bacteria minimal plants and in fungi In mo^t sp^ci^s th> j r^ ire mote than on- 1 isozyme- ifui t^mpls; 
human ha\e at least su iso^mes yeast have three etc i A number of other ^tnc-depenoent dehyorogenases are 

30 dustily elated to zuw AI2H [4] are - Ayhfol dehydrogenase fhC 1 1 19} tb-n'lulost: reductase) - Sorbitol de- 

hydrogenase i EC 1 1 1 14; - Aryl-al:ohol d^hv urogenase ^EC 1 1 1 90) ^benzvl alcohol oehydtogenase) -Tnieontne 
3-dehydrogenase (EC 1 .1.1.103). -Cinnamyl-alcoho! dehydrogenase (EC 1 .1 . 1 . 195; t CAD) [5], CAD is a plant enzyme 
invoked in the biosynthesis <.f ligmn - Gatectitol-t -phosphate deh/dionenabc: (EC 1 1 1 25 1 } - Pstudomonas |.utida 
b exo alcohti deh^orogenase (t:C 111-} (6j Eiohernhij coli starvation sensing protein tspB ■ Ei.chenchi.j coli 

35 hypothetical protein yjgE -E^chenrhtacoli hyputh> j tir?<| protein yigV -Esihtiiichidcoh hypothetical protein yijN - \v<2hi 
hypothetical protein YALOBOw (FUN49), - Yeast hypothetical protein YAL061W (FUN50), •• Yeast hypothetical protein 
> CR105w Thi :i patti : 'rn that tias b*&u d^v^lop^d to detect ttn 4 . class of tn^ymtis is bas> : 'd ori a (.oristivod region itiat 
includes a nistioinf- residue l vhich is the second ligand of tn^ catalytr zinc ftom This family also incltioes NADP- 
oependent qumone o^id?reouctase tEC t o 5 5) an enzyme found in bacteria sgene qotj in yeast and in mammals 

40 v^heif 1 in some species buch ah lodents it has be^n i^citnted as an h/h lt-nb piotem and i*. knov.n as z^ta-ci /stallin 
{/j ] he sequerict; o! quincne oidofedu^'tase li, distantlv related to thai Gltuw zinc-coritaming alcohol dehvdrog^nastis 
and it lacks the zinc-ligand restoues Tne torpedo dsn and mammlian synaptic vesicle membrane protein s'at-l is related 
to qor a specific pattern has been developed foi this subfannl> 
Consensus pattern G H [ : .-<C )-G-\it)-\GtK\->u2t-\\\ $>^C} [H ii j.tkk li^anoj 

4S Consensus pattern: 3GSOHDEQH3-x(2t-L-x(3HSA]f2VG-G-x-G-x(4t-Q-x{2HKR]- 

[13 Branden C.-L Joernvall H„ Eklund H.. Furugren B. (in) The Enzymes (3rd edition) 11:1 04-1 90(1 975). 
[2] Joemval! H.. Persson B., Jefferv J, Eur. J. Biochem. 187:195-201(1987}. 
[ 3] Sun H.-W, Piapp 8.V. J. Mo!. Evoi. 34:522-535(19821 
so [ 4] Persson B. , Hailborn J„ Walfridsson M.. Hahn-Haegerdal 8., Keraenen S.. Penttilae M.. Joernvall H. FE8S 

Lett. 324:9-14(1993). 

[ 5] Knight M.E.. Halpin C. Schuch W. Plant Mol. Biol. 19:793-801(19921 

[63 Koga H . Aramaki H.. Yamaguchi E.. Takeuchi K.. HoriuchiT.. Gunsalus I C. J. Bactenol. 186:1089-1095(1986). 
[ ?3 Jo'^fru'a!! i~l Person B Cm Bois G LavtisGC Chen J H Gonzalez P RaoPV Zigler J -> It FEB-> Lett 
SS 322:240-244(1993). 

[0239] (aid^dhi Aldehyde dehydft jenat^ active sites 

[0240] Aldehyde dt;hydrc jenases (Ef ! 2 1 1 and Eu I z I 5Kiie en^ymts which OMdize a \Mde van-it> ofaliphalic 
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and aromatic aldehydes, in mammals at least four different forms of the enzyme are known ctass-1 (or Aid C} a 
tetramericeytosolic enzyme, elass-2 (or Aid M) a tetrameric mitochondrial enzyme, elass-3 (or Aid D) a dimeric cytosolic 
enzyme, and class IV a microsomal enzyme Aldehyde dehydrogenases have also been sequenced from fungal and 
bacterial species. A n Limber of enzymes are known to be evolutionary related to aldehyde dehydrogenases, these 

s enzymes are listed below. - Plants and bacterial betaine-aidehyde dehydrogenase (EC 1.2 i 8) [2], an enzyme- that 
catalyzes the last step in the bios.ynthes.is of betaine ■ Plants, and bacterial NADP-dependent glyoeraldehyde-3-phos- 
phate dehydrogenase (EC 1.2 19) - Escherichia coli succinate-semiaidehyde dehydrogenase (NADP+)(EC 1,2,1,16) 
(gene gabD) [3], which reduces succinate semiaidehyde into succinate. - Escherichia cofi lactaldehyde dehydrogenase 
(EC 1.2.1.22 ) (gene aid) [4]. - Mammalian succinate semiaidehyde dehydrogenase (NAD+) (EC12124), - Escherichia 

10 colt phenylacetaldehyde dehydrogenase (EC 1 2 1 39} - Escherichia colt 5-carbo>ymethyl-2-hydroxyrnueonate sem- 
iaidehyde dehydrogenase iqene hpcCi. - Pseudomonas putida 2-hydroxymuconic semiaidehyde dehydrogenase [5] 
(genes dmpC and xylG), an enzyme in the meta-cleavage pathway for the degradation of phenols, cresols and catechol. 

- Bacterial and mammalian methyl ma lonate-semialdehyde dehydrogenase (MMSDH) (EC 1.2.127 ? [6], an enzyme 
involved in the distal pathway of valine catabohsm. - Yeast delt3-1-pyrroline-?-carboxy!ate dehydrogenase (EC 

i5 1-5.1.12) [7} (gene PUT2), which converts proiine to giutamate. - Bacteria! multifunctional putA protein, which contains 
a delta- 1 -pyrrol ine- 5-carfooxyiate dehydrogenase domain. - 28G, a garden pea protein of unknown function which is 
induced by dehydration of shoots [8], ■• Mammalian formyltetrahydrofolate dehydrogenase (EC 15 1 8} This is a 
cytosolic enzyme responsible for the NADP-dependent decarboxylase reduction of 10-formyltetrahydrofolate intotet- 
rahydrofolate. It is an protein of about 900 amino acids which consist of three domains' the C- terminal domain (480 

so residues} is structurally and functionally related to aldehyde dehydrogenases - Yeast hypothetical protein YBRQ06w 

- Yeast hypothetical protein YER073w - Yeast hypothetical protein yHP039c. - Caenorhabditis elegans hypothetical 
protein F01F1.6.A glutamic acid and a cysteine residue have been impiicated in the catalytic activity of mammalian 
aldehyde dehydrogenase. These residues are conserved in all the enzymes of this family Two patterns have been 
derived for this family, one for each of the active site residues 

25 Consensus pattern: ELIVMFGA]-E-(LiMSTAC)-EGS3-G-{KNLM]-{SADNHTAPFVj [B is the active site residue]- 

Consensus pattern jFYl.VA|-x(3)-G-{QEl-x-C-(I..IVMC5S7ANC3-[A<5CN3-x-|GSTADNEKR| [C is the active site residue 

[ 1] Hempe! J., Harper K. t Lindah! R. Biochemistry 28:1180-1167(1989). 
[2] Weretilnyk E.A., Hanson A.D. Proc. Natl. Acad. Sci. U.S.A. 87:2745-2749(1990). 
30 [ 3] Niegernann E . Schulz A.. Bartsch K Arch. Microbiol. 160 4S4-460H993) 

[4] Hidalgo E . Chen Y-M . Lin E.C.C , Aguriar J J. Bacterid 173:6118-6123(1991). 
[ 5] Nordiund !., Shingler V. Biochim. Biophys. Acta 1049:227-230(1990). 

[6j Steele M.l . Lorenz O.. Hatter K . Park A.. Sokatch J R. J Biol. Chem. 267 13585-13592(1992) 
[7] Krzywicki KA, Brandriss M.C. Mot, Cell. Biol. 4:2837-2842(1984). 
35 [ 81 Guerrero P.D, Jones J.T., Mullet: J. E. Plant Mel. Biol. 15:11-26(1930). 

[ Sj Cook R J.. Lloyd R S., Wagner C. J. Biol Chem. 266 4965-4S73(1S91). 

[0241] 48. Aido/keto reductase family signatures 

The aldo-keto reductase family [1 ,2] groups together a number of structurally and functionally related NADPH-depend- 
40 ent oxidoreductases as well as some other proteins The proteins known to belong to this family are' - Aldehyde re- 
ductase (EC 1 1 1 2) - Aldose reductase (EC 1 1 1 21). -3-alpha-hydro>ysteroid dehydrogenase (EC 1 1 1 50). which 
terminates androgen action by converting 5-aipha-dihydrotestosterone to 3-a!pha-androstanediol. - Prostaglandin F 
synthase (EC 1 1.1. 188) which catalyses the reduction of prostaglandins H2 and D2 to F2-alpha ■■ i>-sorbitol-6-phos- 
phate dehydrogenase i'EC 1.1.1.200 ) from apple ■■ Morphine 6-dehydrogenase (EC 1.1.1.218) from Pseudomonas 
■>s putida plasmid pMDH7.2 (gene morA) - Chlordecone reductase (EC 1 1 1 225} which reduces the pesticide c.hlo- 
rdecone (kepone; to the corresponding alcohol - 2,5-drketo-D-gluconic acid reductase (EC 1 1 1 -) which catalyzes 
the reduction of 2,5-diketogluconic acrd to 2-keto-L-gulonic acid, a key intermediate rn the production of ascorbic acid. 

- NAD/PjH-dependent xylose reductase (EC 111-5 from the yeast Ptchia sttpitis. This enzyme reduces xylose into 
xyiit -Trans~1,2~dihydrobenzene-1.2-d!oi dehydrogenase (EC 13 1 i 20). - 3~oxo-5-beta-steroid 4-dehydrogenase i£C 

50 "i 3-99-6} which catalyzes the reduction of de!ta(4)-3-oxosteroids. - A soybean reductase, which co-acts with ehalcone 
synthase in the formation of 4,2\4'-trihydro:<ychalcone ■ Frog eye lens rho crystaiiin. ■ Yeast GCY protein, whose 
function is not known - Leishmanta major P110/11E protein P110/11E rs a developmentally regulated protein whose 
abundance is markedly elevated in promasfigotes compared with amastigotes. Its exact function is not yet known. - 
Escherichia coli hypothetical protein yafB. - Escherichia coli hypothetical protein yghE - Yeast hypothetical protein 

55 YB R 1 49w - Yea st hypothetical prote in YH R 1 04w. - Yea st hy pothetica ! prote in YJR096w.Th e se proteins have a II a bout 
300 amino acid residues. Three consensus patterns have been developed that are specific to this family of proteins. 
The first one is located in the N-termina! section of these proteins The second pattern is located in the central section 
The third pattern, located in the C-terminal, is centered on a lysine residue whose chemical modification, in aldose and 
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aldehydereductases, affect the catalytic efficiency. 

Consensus pattern: G-[FY}-R-[HSALHLiVMF]-D-[STAGC]-[AS3-x{5)-E.x{2HL!VM]- G- 
Consensus pattern: [LiVMFY]-x(9HKREQ3-x-[LIVM]-G-[L!VM]-[SC3-N-[FY]- 

Consensus pattern: [LiVMHPA!V3-[KRHSTj-x(4)-R-x{2HGSTAEQK]-[NSL3-x{2HLIVMFAI [K is a putative active site 
s residue]- 

[ 1] Bohren K.M.. Bullock 8., Wermuth B.. Gabbay K.H. J, Bio!. Chern. 264:9547-9551(1989). 
[2] Bruce N.C., Willey D.L, Couison A.F.W, Jeffery J. Biochem. J. 299:805-811(1994). 

10 [0242] 49 Alpha amylase. This, faintly is classified as family 13 of the- glycosyl hydrolases. The structure is. an 8 
stranded alpha/beta barrel interrupted by a -70 a.a calcium-binding domain protruding between beta strand 3 and 
alpha helix 3, and a carboxyl-termmal Greek key beta-barrel domain. 

[1] Larson SB, Greenwood A. Cascio D. Day J. McPherson A. J Mol Biol 1994:235:1560-1584. 
[0243] 50. Aminotransferases class-i pyridoxal-phosphate attachment site 

»5 Aminotransferases share certain mechanistic features with other pyridoxai- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxai- phosphate group to a lysine residue. On the basis of sequence similarity these 
various enzymes can be grouped [1 .2] into subfamilies One of these, called class -l. currently consists of the following 
enzymes' - Aspartate aminotransferase iAAT) (EC 2.6 A A) AAT catalyzes the reversible transfer of the ammo group 
from L-aspartate to 2-oxoglutarate to form oxaloacerate and L-giutamate. in eukaryotes. there are two AAT isozymes: 

so one is located in the mitochondrial rnatnx. the second is cytoplasmic In prokaryotes, only one form of AAT is found 
(gene aspC). - Tyrosine aminotransferase (EC 2 6 1 5) which catalyzes the first step in tyrosine cataboiism by reversibiy 
transferring its ammo group to 2- oxoglutarate to form 4-hydroxyphenyipyruvate and L-glutamate - Aromatic ami- 
notransferase (EC 2.6.1 .57) involved in the synthesis of Phe, Tyr, Asp and Leu (gene tyrB). - 1-aminocyclopropane- 
1-carboxyiate synthase (EC 4-4-1.14) (ACC synthase) from plants. ACC synthase catalyzes the first step in ethylene 

25 biosynthesis. - Pseudomonas denitrificans cobC, which is involved in cobalamin biosynthesis. - Yeast hypothetical 
protein YJL069W The sequence around the pyridoxal-phosphate attachment site of this class of enrryme is sufficiently 
conserved to allow the creation of a specific pattern 

Consensus pattern: [GS]-[L! VM F YTAC]-[GSTA]-K-x(2)-|GSALVN3-[L 1 VM FA]-x-[GN AR]- x-R-[LIVMAj-[<3A] [K is the py- 
rrdoxai-P attachment srte] 

[ 1j Bairoch A. Unpublished observations (1992), 

[23 Sung M.H., Tanizawa K., Tanaka H., Kuramitsu 3.. Kagamiyama H., Hirotsu K., Okamoto A., Higuchi T, Soda 
K. J. Biol Chem. 266:2567-2572(1991 ). 

35 [0244] 51. Aminotransferases class-ii pyndoxai-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxai- phosphate dependent enzymes, such as 
the covalent binding of the pyndo<al- phosphate group to a lysine residue On the basis of sequence similarity, these 
various enzymes can be grouped [1] into subfamilies One of these, called class-ll, currently consists of the following 
enzymes' - Glycine acetyltransferase (EC 2.3 1 29';, which catalyzes the addition of acetyl-CoA to glycine to form 

40 2-ammo-3-oxobutanoate (gene kbl). - 5-aminoieyuimic acid synthase (EC 2.3.1.37) (defta-ALA synthase), which cat- 
alyzes the first step in heme biosynthesis via the- Shemin (orC4) pathway, t e the addition of succinyl-CoA to glycine 
to form 5- aminolevuiinate. - 8-amino-7-oxononanoate synthase (EC 2. 3. 1.47) (7-KAP synthetase), a bacteria! enzyme 
(gene biof- : ) which catalyzes an intermediate step in the biosynthesis of biottn: the addition of 6-carboyy-hexanoyi-CoA 
to alanine to form S-amino-7-oxononanoate ■■ Histidinoi-phosphate aminotransferase (EC 2.61 9), which catalyzes 

45 the eighth step m histidine biosynthetic pathway the transfer of an amino group from 3-(irnidazol-4-yij-2-oxopropyi 
phosphate to glutamic acid to form histidinoi phosphate and 2-oxoglutarate - Serine palmitoyltransferase (EC 2 3 1 50) 
from yeast (genes I..CB1 and LCB2), which catalyzes the condensation of palmitoyl-Co.A and serine to form 3-ketosph- 
mganine The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently con- 
served to allow the creation of a specific pattern 

so Consensus pattern: T-[LiVMFYW3-[STAG]-K-[SAGHLIVWiFYWR3-[SAG3-x{2)-[SAG] 
jK is the pyridoxal-P attachment site]- 
[ 1) Bairoch A. Unpublished observations (1991). 

[0245] 52. Aminotransferases class-ill pyridc*al-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxai- phosphate dependent enzymes, such as 
55 the covalent binding of the pyridoxai- phosphate group to a lysine residue On the basis of sequence similarity, these 
various enzymes can be grouped [1.2] into subfamilies One of these, called class-lil, currently consists of the following 
enzymes: - Acetyiomithine aminotransferase (EC 2.6.1 11) which catalyzes the transfer of an amino group ffom acety- 
lomithine to alpha-ketogiutarate. yielding N-acetyl-glutamic-5-serni-aldehyde and glutamic acid - Ornithine arni- 
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notransferase (EC 2.6 A AZ). which catalyzes the transfer of an amino group from ornithine to alpha-ketoglutarate, 
yielding glutamic-5- semi-aldehyde and glutamic acid. - Omega-amirto acid-pyruvate aminotransferase (EC2.6. 1.18't, 
which catalyzes transamination between a variety of omega-ammo acids, mono- and diamines, and pyruvate It plays 
a pivotal role in omega amino acids metabolism. - 4-aminobutyrate aminotransferase (EC 2 8 1 19) (GAB A transami- 

s nase). which catalyzes the transfer of an ammo group from GABA to alpha-ketoglutarate, yielding succinate- somial- 
dehyde and glutamic acid. ■■ OA PA aminotransferase (EC 2 6 1 62) , a bacterial enzyme (gene hioA) which catalyzes 
an intermediate: step in the biosynthesis of biotm, the transamination of 7-keto-8-aminopelargonic acid i 7-KAPt to form 
7,3- diammopelargonic acid (OAPA). - 2,2-dialkylglycine decarboxylase (EC 4.1.1. o^j. a Pseudomonas cepacia en- 
zyme (gene dgdA) that catalyzes the decarbonating amino transfer of 2.2-dialkytgiycine and pyruvate to diaikyl ketone. 

to alanine and carbon dioxide. - Giutarnate-1-serrnaldehyde aminotransferase (EC 5 4 3 Si (GSA). GSA is the enzyme 
involved in the second step of porphyrin biosynthesis, via the Co pathway, it transfers trie amino group on carbon 2 of 
glutamate-1-semiaidehydetothe neighbouring carbon, to give delta-aminoievulinic acid - Bacillus subtilis aminotrans- 
ferase yhxA. ■• Bacillus subtilis aminotransferase yod'f! ■• Haemophilus influenzae aminotransferase HIG949. ■ 
Caenorhabditis elegans aminotransferase T01B11 2 The sequence around the pyridoxai-phosphate attachment site 

»5 of this class ofenzyme is sufficiently conserved to allow the creation of a specific pattern. 

Consensus pattern: [LIVMFYWC](2Vx-D-E-[iVA]-x(2VG-[LiVMFAGC]-x(0,1HRSACLi]-x-[GSAD]-x(12.16)-O-[LiVM- 

FC3-[LIVMFYSTA|-x(2)- [GSA3-K-x(3HGSTADNVHGSAC3 [K is the pyrfdcxal-P attachment site]- 

[ 1] Bairoch A. Unpublished observations (1992).[ 2] Yonaha K., Nishie M.. Aibara S. J. Biol. Chem. 287:12506-12510 

(1992). 

so [0246] 53. Ank repeat There's no clear separation between noise and signal on the HMM search Ankynn repeats 
generally consist of a beta, alpha, alpha, beta order of secondary structures The repeats associate to form a higher 
order structure. 

[13 A, HolakTA, FEBS Lett 1997;401:127-132. 
25 [2j Lux BE, John KM, Bennett V, Nature 1090;345:736-739, 

[0247] 54. Aminotransferases class-iV signature 

[0248] Aminotransferases share certain mechanistic features with other pyridoxal-phosphate: dependent enzymes, 
such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, 
30 these various enzymes can be grouped (1.2J into subfamilies One oi these, called class-IV. currently consists of the 
following enzymes: 

Branched-chain ammo-acid aminotransferase (EC 2 6 1 42) (transaminase B), a bacterial (gene ilvE) and eukary- 
otic. enzyme which catalyzes the reversible transfer of" an amino group from 4- methyl -2-o:<opentanoate to gluta- 
35 mate, to form leucine and 2-oxogiutarate. 

D-alanine aminotransferase (Ev.C 2. 6. 1.21) A bacterial enzyme which catalyzes the transfer of the amino group 
from D-alanine (and other D-amino acids i to 2-o<oglutarate, to form pyruvate and D-aspartate 
4-3tnino-4-deoxyc.horism3te (ADC) iyase (gene pabC). A bacterial enzyme th3t converts ADC into 4-aminoben- 
zoate (PABA) and pyruvate. 

[0249] The above enzymes are proteins of about 2 70 to 415 ammo-acid residues that share a few regions oi sequence 
similarity. Surprisingly, the best-conserved region does notincludethe lysine residue to which the pyndoxal-phosphate- 
group is known to be attached, in ilvt- 'The region that has been selected as a signature pattern is located some 40 
residues at the C-terminus side of the PlP-iysine 
45 Consensus pattern: E-x-[STAGCf3-x{2)-N-[LfVMFACHFY]~x(ej2HLiVMF3-x-T-x(6 < 8Hi-!VM3-x-[GSHLiVM]-x-[KR3~ 

[1] Green JM, Merkel W.K , Nichols B P J Bacteriol 174:5317-5323(1992). 
[2] Bairoch A. Unpublished observations (1992). 

so [0250] 55 Aminotransferases class-V pyndo^ai-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyndoxai- phosphate dependent enzymes, such as 
the covalent binding of the pyndoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1,2] into subfamilies. One of these, called c!ass-V, currently consists of the following 
enzymes - Phosphoserine aminotransferase (EC 2 8 1 52). an enzyme which catalyzes the reversible in to;! conversion 

55 of phosphoserine and 2-oxog lutes rate to 3-phosphonooxypyruvate and glutei mate It is required both in the major phos- 
phorytated pathway of serine biosynthesis and in pyndoxine biosynthesis. The bacterial enzyme (gene serC) is highly 
similar to a rabbit endometrial progesterone-induced protein (EPiP), which is probably a phosphoserine aminotrans- 
ferase [3]. - Serine— glyoxyiate aminotransferase (EC 26 1 45) (SGAT) (gene scsaA) from Methylobactenum e>:- 
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toiquens ~ Sen ne-oj ruvate ammotransfeiase iEC 2 6 i 51 J This enzyme also acts as an alamne-glyoAylate ami- 
notran^fffase (E> 2 6 1 44 > !n vertebiates rt is located tn th*> |.erovisomes and/ot mitOv.hondiia - Isopeniullin N 
^piineiast- Ui^ne c^tD) This, en::vme is invoked in (he biosynthesis ot c exhale Sf.orm antitiotics and catalyzes tht 
reversible isomerization of isopenicilltn N and penicillin N. - NifS, a protein of the nitrogen fixation operon of some 

s bacteria and cyanobactena The exact function of mf$ is not yet known -\ higniy similar protein has been found in fungi 
igene NFS! oi SPI. n Hip wriallpubuniUt cyjnobac filial soluble hvdiogenase iFC 1 s ■ Hypothetical prttein 
yebU ffom Bacillus ■subtile - Hypothetical protein > FL030v\ from yt^t The sequence aiound the puidoxal-phusphate 
attachment site of this class of enzyme is sufficiently conserved to allow the creation or a specific pattern. 
Consensus pattern (I. I v F- YC H'i j [ DC M ]■ [I. I VM F C/Vl HLIVMFYA|-\f2HG3rAC j-jGSTAj- [HQR|-K v,4 6)-C v [GSA'f ]■ 

10 v-[LIVMFYSA>"] [K it, tht f. > ridoxal-F attachment sttej- 

[ 1] OuzounisC, Sander C. FEBS Lett. 322:159-164(19931 
[2j Bairoch A. Unpublished observations (1992). 

[ 3] vfjn der ZH A Urn H -M Winkler M E Nucleic Acns Pes 1 7 8379-85~9i 19^0) 

J5 

[0251] 56. Annexins repeated domain signature 

Anne>ins j1 to o[ ate .t gioup of calcium-binding pitteins that asioewte n^rsihlv with memttanes 'I hey tind to 
phospholipid bi!a\fctb in the ptesence ot mtcromolat ttee calcium concentration The oinding is specific fot calcium and 
for acidic phospholipids Anne<ins ha^t- been claimed to bp involved in cytoskeletal mtei actions phosphohpase tnhi- 

so bilion intracellular signalling anticoagulation and membt : ine fusion Each of th^st; piofeins eonsisl of an fWerminal 
domain of ✓unable length folio* eo by f?ut ot eight copies of a conserved segment of si>ty one residues Tne tepeat 
(sometimes known as an 'endonexm told) consists of tive alpha-helices that are wound into a right-handed superhehx 
[7] The pioteins knoom to belong to th*> annexin family 3i*> listed Leiow - Annexin I (LtpOv.ortin 1 } (Calpactin 2) tpo5i 
(Chromobtndin 9). - Annexin II (Lipocortm 2) (Calpactin 1) (Protein I) (p38) sChrornobmdin 81 - Annexin III (Lipocortm 

25 3)f,PAP-iih. - Annexin IVtlfpocorttn4)(Endonexin !) (Protein !h (Chromobtndm 4). - Annexin V (Lipocortm 5)tEndon- 
exin 2) (VAC-aipha) (Anchorin Cli) (PAP-!). - Annexin V! (Lipocortin 6} (Protein ill) (Chromobindin 20) (p68) <p70). This 
is the onlv known annexin that contains a (instead of 4i repeats. - Annexin VII (Synexinl - Annexin vtt! (Vascular 
anticoagulant-beta) (VAC-beta). - Annexin IX from Drosophila. - Annexin X from Drosophila. - Annexin XI {Calcyelin- 
associated annexin) (CAP-5G). - Annexin XII from Hydra vulgaris. - Annexin XIII (intestine-specific annexin) (iSA).The 

30 signature pattern for this domain spans positions 9 to 81 oi the repeafand includes the only perfectly conserved residue 
(an arginine in position 22)- 

Consensus pattern: [TGHSTV>x(8Ht-!VMF]-x(2}-R-x(3HDEQNH3-x(7)-pFY]- x(7HL!VMF^(3HL!VMF^(11 )- 
[LfVMFA}-x{2HUVMF]- 

35 [ 1] Rayna! P., Pollard H.8. Biochim. Biophys. Acta 1197:83-93(1994}. 

[2j Barton GJ.. Newman R.H., FreemontRS., Crumpton M.J. Eur. J. Biochem. 198:749-760(1991). 
[ 3] Burgoyne R.D . Geisow M J Cell Calcium 10:1-10i 1989) 

[4] Haigler H.T., Fitch J.M., Jones J.M., Schlaepfer D.D. Trends Biochem. Sci. 14:48-50(1989), 
[ 5] Klee C B. Biochemistry 27 6G45-6653(1988) 
40 [6] Smith P.O., Moss S.E Trends Genet 10 241 -248(1 994 ! 

[ 7] Huber R , Roemisch J . Paques E.-P EM BO J 9: 388 7-3874(1 990) 
[ 8] Fiedler K„ Simons K. Trends Biochem. Sci, 20; 177- 178(1995). 

[0252] 57 i'arf.J ) ADP-ribosyiaticn factors family signature 

■>s ADP-ribosylation factors (ARF) [1 .2,3,4] are 20 Kd GTP-binding proteins involved in protein trafficking They nu;, mod- 
ulate vesicle budding and uncoating within the Goigi apparatus. ARF's aiso act as ailosteric activators of cholera toxin 
ADP-ribosyltransferase activity. They are evolutionary conserved and present in ail eukaryotes At least six forms of 
ARF are present in mammals and thtee in budding yeast The ARF family aiso includes proteins highly related to ARF's 
but which lack the cholera toxin cofactor activity, they are collectively known as ARL's t'ARF-like) ARD1 is a 64 Kd 

so mammalian protein of unknown biological function that contains an ARF domain at its C -terminal extremity. Proteins 
from the ARF- family are generally included in the RAS 'superfamily' of small GTP-binding pioteins but they are 
only slightly related to the other RAS proteins They also differ from RAS proteins in that they lack cysteine lestdues 
at their C-termini and are therefore not subject to prenylation. The ARFs are N-terminaiiy mynstoylated (the ARLs have 
not yet been shown to be modified in such a fashion). A conserved region in the C-termina! part of ARF's and ARL s 

ss has been selected as a signature pattern 

Consensus pattern: [HRQT]-x~[FYWI]-x-[LiVM3-x(4)-A~x(2)-G~x(2)-[L!VM]-x(2)-[GSA3-[LiVMF]-x-[WK]-[LIVM]- 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see <PDOC00017 
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[ 1] Boman A.L. Kahn R.A. Trends Biochem. Sci. 20:147-150(1995). 
[2]MobSJ Vaughan M Cell biynal 4 jG7-C"ai£93} 
[ 3] Moss J.. Vaughan M. Prog. Nucleic. Acid Res. Mai Biol. 45:4 7-65(1 993). 
[4] Amor J.C.. Harrison D.H.. Kahn R.A.. Ringe D. Nature 372:704-708(1994), 
s [$] Valencia A Charain P Wittinghofer A Sander C Biochemist^ 30 4637-464b( 190 n 

[0253] <atf_2t -iTP GTP-bindino sit u motil A (P-ioop) 

From sequence comparisons and crista Hog raphic data analysis it has been shown [1,'Z 3 4 5 6] that an appreciable 
proportion of piotems that bind A'iP oi <7'!P share a numbei ot more or less conserved sequence motifs i'he best 

to con^fvt-d of these motifs is : i qiyunt;-rk'h region which typically fornix a flexible loop btikvien a beta-skand : md an 
alpha-helK This loop interacts <vtth or»* of the phosphate groups ot tht nucleotide This s^quenc^ motif is generally 
refetied to ab the 'A' v.onsf n^u*. sequence [ !] oi the 'P-loop' [5j Th^re art- nutritious ATP- of GTP-binding piotems in 
which the P-k op is k und A numbei of protein families for which the rdevanoe of the presence of such motif hat been 
noted are listed below: - ATP synthase alpha and beta subumts {see <PDGC00137?>. - Myosin heaw chains, - Kinesin 

»5 heavy chains and kmesm-like proteins (see «PDOC0G343>). - Dynamms and dynamo-like proteins (see 
<PDOC00362». - Guanyiate kinase (see <PDOC00670». - Thymidine kinase {see <PDOC00524> J. - Thymidylate 
kinase (see <PDOCG1034>S. - Shikimate kinase (see <PDOC00868>). - Nitrogenase iron protein family (nifH/frxCI 
(see DOC 00560-1 -ATP-bindmg pioteins invoked in 'active transport' iAEC transporters 1 E~] tsee -PDOC00135--) 
- DNA and RNA heiicases [B,9,10], - GTP-binding elongation factors (EF-Tu, EF-laipha. EF-G, EF-2. etc.). - Ras family 

20 of GTP-tindino. ( Kk-ms (Ras Rhc Rab Ral \pt1 SEG4 tk ) - Nuclear (.toit-in ran (see ^PDOC00859>) - ^CP- 
ribosylatinn factots family fsw-'PDOC0078f - Bacterial dnaA protein (see -'PDOC00771--' - ea:terial te:Apiotem 
^see '-PDOC00I3I^i - Bactenal recF protein (see <PDOC 00530-1 - Guanine nucleotide-bmaing proteins aloha sub- 
units (Gi, Gs, Gt GO. etc.). - DNA mismatch repair proteins mutS family (See <PDOC0038tS>). - Bacterial type if 
s^rr-Hion ^yst^m ptokin E isee -FDOCOOSfV^I Not all ATP- ot GTP-hindmq proteins at# ptrked-up b«, this motif A 

25 number of proteins escape detection because the structure of their ATP-bindmg site is completely different from that 
of the P hop t\ampiei. ot such proteins aie the t1 ■£■:<' A'iPa?e? or the cjlycoiytic kinases In other A'fP-oi Gi'P binding 
piotems the tlexihlsr- locp otsts in a s-liohtly different form this is the cassr- for tubulins or protein kinatn?t> & ^peual 
mention must hu r^s^tved for ad* mlat- 1 hnas- 1 in 'Vhtrh th# it is a single deviation lioin th# P-!oup p litem in the la^t 
position Gly is found instead ot Ser or Thr. 

30 Consensus pattern: [AG]-x(4 1-G-K-[ST]- 

[ 1j Walker J. E.« Saraste M.. Runswick M.J.. Gay N.J. EM BO J, 1:945-951(19821. 
[2]MoflerV\, &mon, R FEBb Lett 18r l--(1^85* 

[3] Fry D.C, Kuby S.A.. Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1988). 
35 [4jDy\'yrTE GlymasM I Mernci-WC Proc Natl -±c;ks Su USA o4 1814-1£1ft< 193". 

[ 5j Saraste M , Stbbald PR vYittinghofer A i'lend*. Btoohem Sci 15 430-434(1990i 
[ 6] Koontn E V J Mot Bid 2^3 1165-1174(199^ 

[ 7] Htgrfint, T T HvoeSC Mimmach M M Gile^oi U Gil! D R G3llaqhet M P 1 Bioen^rg Biomembr 22 
571-592(1990). 

40 [SJHod^m^nTc N=?tuie C 33 2^-2 3t I "SSI and N^tuie 3o3 5"3-£"78(1£38WEi ratal 

[ 9j Linderh UskoP Ashbu nft M Le oyP Ni^ls^nh J Nishih Schni^rJ Ht nimski Pi- Naturf33/ 1 '1-122 
(19891. 

[10] Gorbalenya A.E.. Koonin E.V., Donchenko A, P.. BHnovV.M Nucleic Acids Res, 17:4713-4730(1989). 

45 [0254] 53 Alginate family ^qnatuies 

The following enzymes ha\e been sho\"n [i] tr be evolutionary related - Arginase (EC 3 5 3 h a ubiquitous enzyme 
which catalyzes the degradation ot arginine to ornithine and urea [2], - Agmatmase (EC 3.5.3. 11 i (agmatine ureony- 
dtolase) <j ftokatyoti; enzyme (gen^ speB) that ratalyze^ th^ hyritolysais of agmatine into putrescine and mea - 
Formiminoglutamase(EC 2 5 2 bl tformiminogliitamate nydroiasei a prokar\otic enzyme (gene hutGithat hydrolyzes 

so N-formimino-glutamate into glutamate and tormamide. - Hypothetical proteins from methanogenic archaebactena. 
1 hese enzymes are prot»ms of about 30C amine acid residues i hree conserved recjicns - that contain oharged residue 
^>nicn aie involved in the t mdino of th^ two manganese ions [33 can ns^i a^ signature patterns - 
Consensus pattern: [LIVMF3-G-G-x-H-x-[L!VMTHSTAV3-x-EPAG]-x(3HGSTA3 [H binds manqanese]- 
Consensus pattern: ELIVM](2Vx-[LiVMFYj-D-[AS}-H-x-D [The two D'sandthe H bind manganese]- 

ss Consensus pattern: EST3-[L!VMFY]-0-ELiVM3-D-xf3HPAQ3-x(3)~P-|GSA3-x(7)-G [The two D*s bind manganese] 

E 1] OuzounisC. Kyrpides N.C. J. Mol. Evol. 39:101-104f1994). 

EI] Jonkmson C P GrodvWVv »"odc tbaum S D Coirif. Biothtm Physic I 1 14B 107-13z t 198} 
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[ 3j Kanyo Z F , Scolnick L.R . Ash D £., Chnstianson D.W Nature 383-554-557(1996) 
[0255] 59 rasp) Eukaryotie and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases, (EC 3.4.23 -) are a widely distributed family of proteolytic enzymes 

s (1,2,3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
ot.es are moncmeric enzymes which consist of two domain?.. Each domain contains, an active site centered on a catalytic 
aspartyl residue.The two domains most probably evolved from the duplication of an ancestral gene encoding a primor- 
dial domain Currently known eukaryotie aspartyl proteases are- - Vertebrate gastric pepsins A and C (aiso known as 
gastrtcsin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 

10 cathepsins D (EC 3 4 23 5) and E (EC 3 4 23.34) - Mammalian renin (EC 3.4 23 15} whoso function is to generate 
angiotensin I from angiotensinogen in the plasma - Fungal proteases such as aspergiliopepsin A (EC 3.4.23-18), 
candidapepsin (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23) (mucor rennin), endothiapepsin (EC 3.4.23.22), polypo- 
ropepsin {EC 3 £23.29), and rhizopuspepsin (BC 3A23 ; 21 ). ■ Yeast saccharopepsm <E:C 3.4.23 : 25) (proteinase A) 
(gene PEP4) PEP4 is implicated in posttransiational regulation of vacuolar hydrolases - Yeast barrier pepsin (EC 

»5 3,4,23.35) (gene BAR 1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone 
- Fission yeast sxal which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviruses. encode for anaspartyl protease which is an homodimerof a chain of about 95 to 
125 ammo acids. In most retroviruses, the protease is encoded as a segment of apolyprotein which is cleaved during 
the maturation process of the virus. It is generally part of the pol polyprotein and, more rarely, of the gagpolyprotem. 

so Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease 
Consensus pattern: [LIVMFGAC3-[LIVMTADNHi~fVFSA]-D-{ST]-G-ESTAVHSTAPDENQ]- x~(LIVMFSTNC]~x~[LIVMF- 
GTA] [D is the active site residue] Note' these proteins belong to families A1 and A2 in the classification of peptidases 
(4,E1 

[ 1 j Foitmann 8 Essays Biochem. 1 /. 52-84(1981 1. 

[2] Davies D R. Annu. Rev, Biophys. Chem 19 189-215(1990). 

[ 3J Rao J.K.M., Ericksc.i J.W., Wlodawer A. Biochemistry 30:4683-4671(1991). 

[ 4] Rawlings N.D., Barrett A. J. Meth. Enzymoi. 248:105-120(1995}. 

[0256] 60. i'BIRA) Biotin repressor 

(1] Wilson KP, Shewchuk LM. Brennan RG. Otsuka AJ, Matthews BW; Proc Nat! Acad Sci USA 1992;89:9257-9261. 
[0257] 61 . BTB/POZ domain 

The BTB (for BR-C. ttk and bah) ]1] or POZ (for Pox virus and Zinc fmger)[2[ domain is present near the hi -terminus 
35 of a fraction of 2inc finger 

tzf-G2H2) proteins and in proteins that contain the Ketch motif 

such as Keieh and a family of pox virus proteins. The BTB/POZ domain mediates homornenc dimensatton and if! some 
instances heteromeric dimensation [2] The structure of the dimerised PLZF BTB/POZ domain h3s been solved 3nd 
consists of a tightly intertwined homodimer The central scaffolding of the protein is made up of a cluster of alpha- 
40 helices flanked by short beta-sheets at both the top and bottom of the molecule [3], POZ domains from several sine 
finger proteins have been shown to mediate transcriptional repression and to interact with components of histone 
deacetylase co-represser complexes including N-CoR and SMRT [4,5.6], The POZ or BTB domain is also known as 
BR-C/TfkorZIN 

45 [ij Zoiiman S, Godt D, Prive GO, Couderc Jl, Laski PA: Proc Nat! Acad Sci USA 1994;91:10717-10721. 

[2]8ardwe!l VJ, Treisman R; Genes Dev 1994,8 1664-1677 

[3] Ahmad KP. Cvngel CK. Prive GG, Proc Natl Acad Sci U S A 1998-95: 121 23 -12128 

[4] Dewemdt C : Albagli O, Bernardin F. Dhordain P, Quief S, Lantoine D, Kerckaert JR Leprince D, Cell Growth 
Differ 1995;6:1495-1503. 
so [5] Huynh KO, Bardwel! VJ: Oncogene 1998;17:2473-2484. 

[6j Wong CW, Privalsky ML; J Bio! Chem 1998;273:27695-27702. 

[0258] 62. (Bac GSPprotemsj Bacteria! type i! secretion system protein D signature 

A number of bacterial proteins, some of which are involved in a general seaeticn pathway (G^P) for the export of 
55 proteins (also called the type II pathway) [1 to 5], have been found to be evolutionary related These protons : ire listed 
below - The 'U protein from the GSP operon of Aeromonas (gene e^eW Erwmia (gene outW Escherichia coil ugene 
yheFi, Klebsiella pneumoniae (gene ptilD). Pseudomonas aeruginosa igene vcpQi Vibnc choierae t gene epsDi and 
Xanthomonas campestris (gene xpsD} - comE from Haemophilus influenzae tnvclved in ■xmpelenci : ' (DNA uplake) 
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- pilu from Pseudomonas aeruginosa, which fs essential for the formation of the pill. - hofQ (hopQ) from Escherichia 
colt. - hrpH from Pseudomonas synngae, which is involved in the secretion of a proteinaceous elicitor of the hypersen- 
sitivity response in perils - hipA1 frcni ^nthoinonas camptistus \ v vesicatory which is also imoKed in the hyper- 
sensitivity response. - mxjD from Shigella flexnen which is involved in the secretion of the Ipa tnvasins which are 

s> necessary for penetration of intestinal epitnelial cells omc from Neisseria gonorrhoeae yssC from Yersinia entero- 
colitica ^'nulencv flasmid *hich s^ems k h<r requited for the e<port of the Yop virulence proteins. ■ 'fhe gf|\ 
protein from fil im^ntous phjges suiJi as f1 iki* ot m13 GpiV is said to involved in phage assembly and morpho- 
genesis These proteins ail seem to start v,rth a signal sequence and are thought to be integral proteins in the outer 
membrane. As a signature pattern a conserved region in the C-termmai section of these proteins has been selected 

10 Consensus pattern: [GR]-EDEQKGHSTVM]-[L1VMA]{3V-[GA}-G-[L1VMFY3-x{11 i-[LIVM]-P-[LIVMFYWGS3-[LiVMF3- 
[GSAE]~x~[L!VM3-P~[LIVMFY\ i V3(2Vx{2t-[LV]-F 

[ 13 Salmond G PC Reeve* F J Trends Biochem >?ci 18 " 12(19^3) 

[23 Peeves PJ vYhitcombe D WhatamS Gibson M Allison G BunceN Barailon R , Douglas P Mi fine-Hand 
»5 V.. Stevens S . Walker S . Salmond G.PC. Mol. Microbiol. 8:443-458(1993). 

[ 3] Martin P.R.. Hobbs M,. Free P.O., Jeske Y. , Mattick J.S. Moi. Microbiol. 9:857-668(19931 
[4] Hobbs M.. Mattick J S. Mol. Microbiol. 10:233-243(19935. 
[53Geninc Boucher C A Mol Gen Genet 243 1 12-11 8(1 1'94> 

[0259] 63 (Bao globin) Protozoan cyanoh^teriai giotini, signature 

Globing are heme-contaming proteins involved in binding and/or transporting oxygen [1], Almost all globins belong to 
a laige family isee -'PDOCOOTt'3^ the only exceptions are the following proteins which form a family of their own 
[2.3]: - Monomenc hemoglobins from the protozoan Paramecium caudatum. Tetrahymena pyrtformis and Tetrahymena 
th> j rmophtia - Cyanoglooin from thu c^anobaaensi Nostoo (.oini mine - Gluhins Li637 and LI410 from the chioruplast 
25 of the alga Chlamydomonas eugametos. - Mycobacterium tuberculosis hypothetical protein MtCY46 23. These proteins 
contain a conserved histidme which could be involved in heme-bmding As a signature pattern, a conserved region 
that ends with this residue was used 

Consensus pattern: F-[LF : 3-x{5VG-[PA3-xf4VG-j:KRA3-x-j:LlVM]-xf3)-H- 

M [ \] » onuse briovelop^dia BiochemisUy be-xnd Edition Walter de Giuyter Btilin i\!e»-Yoik i10i«3> 

[ Z] Takagi T Curr Opin Struct Biol 5 4 1 3-41<i{1993t 

[ 33 Couture M., Chamberiand H.. St-Pierre B.. lafontame J.. Guertm M.: Mol. Gen. Genet. 243:185-197(1994). 

[0260] (54. Band 7 protein family signature 

55 Mammalian band 7 protein [1] (also known as 7.2B or stomatin) is an integral membrane phosphoprotem of red blood 
cells thought to regulate cation conductance by interacting with other proteins of the junctional complex of the mem- 
brane skeleton "-"tiui. tut ally band 7 is « \'dui.ion : ny telated k the folk wing proteins - Catnorhabditis ekgans piotein 
mec-2 [2]. Mec~2 positively regulates the activity of the putative mechanosensory transduction channel, it may links 
the mechanosensory channel and the mictotuDUle c^toskeletonottnetoucn receptor neurons -Caenorhabdittsdegans 

■to piotetn^ sto-1 to sto-4 - > aenorhabditis elegans patem utK-l - Esv.hem,hia coli hypothetical protein ^LbK - Myco- 
baotenumiuberouioijis h>f.oi.heiical piotein MtCY2i / 09 - necho*' ystit, strain PCC <S80 } hypothetical protein sir 1123 

- Methanococcus jannaschu hypothetical protein M4QBd7. Structurally a!! these proteins consist of a short N-fermmai 
domain which is followed by a transmembrane region and a variable sirre (from 1 70 to 350restduest C -terminal domain 
As a signature pattern, a conserved region located about HOresidues after the transmembrane domain was selected 

45 Consensus pattern: R-x(2HLIVHSAN]-x(8HLIV]-D-x(2)-T-x(2)-W-G-[LIV3- 3KRHHi-TV>x~[KR3-|LIV]-E~[L!V3-[KR]- 

[ 1 3 Gallagher P.G.. Forget B.G. J. Biol Chem. 270 26358-26363(1995). 
[23 Huang M : Gu G., Ferguson £'LT'Ch7i'fieM7NaSre 

so [0261] 65 Ban,vin domain signatures 

Eiarwin (1 j is a barley seed protein of 125 residues that hinds, weakly a chitmanrtloc, it c jntams i p cy^ine 1 - involved 
in disulfide bonds, as shown in the following schematic representation. 



^>:x.<>x.<>x:<x^>:x.<C^>:x^>:x.<>x:<C.<>x:<C^Cx.<>x:<x^>:C.<>x:<x^>:x.ox:<xx>:xx>iC!<3j|j + 

ss ++ +»C. conserved cysteine involved in a disulfide bond/*': position otthe patterns. Barwin 

is closely related to the following proteins: - Hevein, a wound-induced protein found m the latex ot rubber trees. - HEL, 
an Arabidopsis thaliana hevem-like protein (2j. - Win1 and win2, two wound-induced proteins from potato. - Pathogen- 
esis-related protein 4 from tobacco. Hevein and the win1/2 proteins consist of an N-terminal chitm-binding domain 
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followed b) a barwin-like C-tetminal domain Ban-in and its i elated rjrotems could be invoked tn a defense mechanism 
tn plants Ab signature patterns two htyhK con^erwd ructions th?t contain ^onit t f the c;> stents weit seated 
Cuns,f=>n<.u<. pattern C-:-<-[KR]-C-L-<-v-x-N [The two C's ate involvid in disulfide tonds]- 
Consensus pattern V-[DNKc-[EQ]-F-V-[DN]-C [0 is m^ol^i in a iisultioe b<wi]- 

M ] Svensson B cS" endsen ! Hoejiup F RoepstoiffP !.udviys<rn S Poulsen f : M Biochemtstiv 51 8' "O 
(1992V 

[2] Potter S.. Uknes S . Lawton K.. Winter A.M . Chandler D., Dimaio J.. Novitzky R., Ward E.. Ryals J. Mol. Plant 
Mtnobe intei au 6 63^ 68£ ' 1 9P; j ) 

[0262] 60 iBo^-man-Bitk leg) Euv\m;<n-Eiri> serine piotejs.* inhibit™ s f imtly signature 

PROSITE cross-reference! s). The Bowman-Birk mhibrtortamilv [1] is one of the numerous families ot serine proteinase 
inhibitor? tt can be seen in the schematic representation thev have a duplicated itiuctuie and yenetallv possess 
two distinct inhibitory sites: 



| + + + + + + | 

xxCCxxCxxCxx#*xCxxCxxxxCxxxCxxxCxxxxCxx#xxCxxCxxCxxCxx 

1 -— + + -+ | 



-70 residues- 



'C: conserved cysteine involved in a disulfide bond. 
'#': active site residue, 
'*': position of the pattern. 



A* [0263] These inhibitors are found tn the seuds of all leguminous pijnis as we-l! : is tn cereal grams In cereals they 
e<tst in two totms one ot\Mnch is a duplication of the baste structure shewn abo^t- [2] The pattern that was developed 
to pick up sequences belonginq to this family of inhibitors is in the central part of the domain and includes four cysteines. 
[0264] Consensus pattern C-<{5 G)-[DE:NOK>RH£TA}-C-[PA3TDHHPASTDK}-E^STD\'3-C-[NDK>3HC't:KRH3TA}-C 
[The foui C's are imolved in disulfide bonds] Note this pattern can be found twice in some duplicated cereal inhibitors 

[ 1 j Lasko.v'.ki M haio I Annu Rev Biochetn 49 bQ3-62^ t 1 980) 

[2j Tashiro M., Hashino K, Shiozaki IV!., Ibuki F.« Maki 2. J, Biochem, 102:297-306(198?}, 

[0265] 6" Prtthoyenesivielated protein Bet v I family iio,natuie 
■*s [0266] A number of pljnt ptotems, which ill seem to be invoked m p : iihogen defense tesponse am structurally 
related [1.2.3]. These proteins are: 

Bet v I. the major pollen allergen from white birch. Bet v I is the main cause of type I allergic reactions in Europe. 

North America and USSR, 
so - Aln g I, the maior pollen allergen from alder 

Api G I. the major allergen from celery 

Car b I. the major pollen allergen from hornbeam. 

Cor a !. the major pollen allergen from haze!. 

Ma! d I. the major pollen allergen from apple. 
55 - Asparagus wound-induced protein AoPR1. 

I' tdney bean pathogenesis-reiated piotetns 1 and 2 

- Parley pathco.enesis-ielai.ed piotein* PP 1-1 and PPi-3 

- P*a disease resistance response ( Kleins l 149 p!176 and DRRG49-C 
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Pea abscisic actd-rebponsi\e oroteins ABR t~ and ABR 18 
Potato pathogenesis-related proteins STH-2 and faTH-21. 
Soybean stress-induced protein bAM22. 

s [026 7] These- proteins arc thought to be intracellular^ located Thov contain from 1SS to 1u0 amino acid residues 
As .t iignrtture pattern a conceived region locateo in the thud quartet of these proteins h^s teen selected 
Consensus pattern: G-x{2HLIVMF : 3-x{4VE-x{2)-j:CSTA£N].xf8.9HGND]-G~[GS3- [CSj-x{2VK-x{4MFY3- 

[1] Breitenedei H Peftenbuifjei K BitoA Valenta P KiattD Rumpoid H boheinei O , Biettenbarh M FMBO 
10 J. 8.: 19 35-1 9 38(1 989'!. 

[2] Crowe!! 0.. John M.E.. Russell D. . Amasino R.M. Plant Mo!. Biol. 18:459-468(19921 
[33 ^innr S A J b,ottR Diaper J Plant Mo! Biol 19 355-501^2 > 

[0268] 68 o~IP tianstrtf tton tu^tois basu. oomarn signatute 

*s The bZIP superfamily [1.^.] of eukaryotic DNA-cmdmg transcription factors groups together proteins that contain a 
basic region mediating sequence-specific DNA-fotnding followed bv a leucine zipper required for dimerization. This 
family is quite large, therefore only a pantai list of some representative members appears here. - Transcription factor 
AP-i v.htch bmos selectively to ennancei elements in the cts oonttol regions of S i /40 and metallothionem l!A AP-t 
also known as ^ |>m is the cellulai homoloy of the a^'ian saicoma virus. 17 fASVi7) oncogene v |un ■ Jim B and jun- 

so D piobahk fransonplton factors v\hk'h ate highly similar to iun/AP-1 - The tos protein a proto-c ncoge'nt; th =tt forms a 
non-mvalentdimw wttn r-jtin - The f ^-related pri'tMnss fia-1 andfosS - Mammalian rAMP response element (CPE' 
binding ptotetns CREB CREM ATF-1 ATF-3 ATF-4 ATF-5 ATF-6 and LRF-1 - Maize Opaque Z a trans-acting 
tianscitpttonal activator involved in th<= regulation of the production of znn prott-mb during cndospeim - Arabidopsis 
G-bo< binding factors GBF1 to GBF4 P irsle s CPRF-1 to CPRF-3 Tob i^co TAF-1 and *vhe*t EMEP-1 All ihe-se- 

2& proteins tuna the G-bo^ promoter elements of inan> plant genes - Drosophila protein Giant which represses the 
e\piession of both the kiuppel and knup? segmentation gap genes ■ Drosophila Bo\ E binding factoi 2 (PBf- <>t a 
transcriptional activator that binds to fa( bodv-s.peoitie sr-nhant ers of alc-olkf dehydrogenase and \olk protein genes - 
Dtosophii i segmentation piotmn fap'ruoll ir (g^nu end which is involved in he 3d morphogenesis - Casmorhabditis 
elegant skn-1 a developmental ptotein involved in the fate of ^ential blastomeres in the early embryo - Yeast GCf\t4 

30 transcnpiion iacktt acumj ononi oifru : 'gt;nerakontrol system that rogulates the^prtiS&tonof ammoae-id-synth^^ing 
enzymes in tespon^eto amino acn statvotion and the related Neutospoia crassa epe-f protein - Netirospora Jiassa 
cys-3 which turns on the expression of structural genes which encode sulfur-cataboiic enzymes. - Yeast 1V1ET28, a 
transcriptional activator of sulfur ammo acids metabolism. - Yeast PDR4 (or YARD, a transcriptional activator of the 
genes for some ono,en detonation enzymes ■ t: pitein-B.jrr virus tians activatot prttem B,M.f : 1 ■ 

35 Consensus pattern: [KR]-x(i .3)-[RKSAQl~N~x(2)-[SAQl(2)-x-[RKTAENQ3-x-R-x-[RK]- 

[ 1] Hurst H.C. Protein Prof. 2:1 05-168(1 995).[ 2] Eiienberger T, Curr. Opin. Struct Biol, 4:12-21 {19941 
[0269] i39. Biotm-requiring enzymes attachment site 

Biotin which pl^yt a i^Ulytic role m tome < at boxv I transfer r^actmni it> ^o^l^ntis' ^ttac heo via ^n amide bond to ^ 
lysine lesdue in enzymes requiting this coenzyme [ t Z 3 4] Such enzymes are 

Pyruvate carboxylase {EC 8.4. 1.1). 

Acetyl-CoA carboxylase (EC 6.4. 1 .2). 

Propionyl-CoA carboxylase (EC 6.4.1 3). 

Methvicrotonoyl-CoA carboxylase i EC 6.4.1.4* 
45 - Geranoyl-CoA f irbuvylas-* iEC M 1 m 

Urea carboxylase (EC 6.3.4.6i. 

Oxaloacetate decarboxylase it-.C 4.1.1.3). 
- Methylmalonyl-CoA decarboxylase (EC 4.1 .1 .41 ). 

Glutaconyl-CoA decarboxylase (EC 4.1.1 .70), 
so - Methylmalonic oA i,arbox\l-tiansteiab<; t.Ec 2 1 3 ti itianscarboxUast-i Sequence data reveal th=rt the isnion 

around the bio~.vttn ^biotin lysine i residue is well conserved .tnd can be used a* d sicsnatuie pattern 

[02703 Consensus pattern[GN]-[DEGTR3-x-[UVMFY]-x(2^^ [K is the 

biotm attachment site] Note the domain around the biotin-bindtng lysine residue is evolutionary related to that around 
55 tht! lipoyl-bmding Ivsiri^ tesidu- 1 of 2-u>u ?<cid d^hvdrog^n^st; fcyitrarisferas^s 

[ 13 knovUeb J R Annu Rev Btochem 5*5 195-2 >U 19S9} 

[2]bamolsD Thrcntonr^ Murtif y L Kumar ^ K H«^FC Wood H o J Biol Chum 263 6461 ^464 
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{1988}. 

[ 3] Goss N.H , Wood H G Meth Ensymol 107:261-278(1984) 

[ 4] Shenoy B C , Xie V , Park V.L , Kumar G.K.. Beegen H . Wood H G . Samols D J Bio! Chem 267 18407-18412 
(1992). 

s 

[0271] 2-oxo acid dehydrogenases, acyltransferase component lipoy! binding site 

The 2-oxo acid dehydrogenase multienzyrne complexes [1,2] from bacterial and eukaryoiic sources catalyze the oxi- 
dative decarboxylation of 2~o*o acids to the corresponding acyl-CoA. The three members of this family of multienzyme 
complexes are: 

10 

Pyruvate dehydrogenase complex iPOC). 
2-oxoglutarate dehydrogenase complex (QGDC), 
Branc.hed-chain 2-o:<o acid dehydrogenase complex {BCOADC} 

*s These three complexes share a common architecture: they are composed of multiple copies of three component en- 
zymes - E1 , E2 and E3. E1 is a thiamine pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide 
acyltransferase, and E3 an PAD- containing dihydrolipamide dehydrogenase. 

[0272] E2 acyltransferases have an essential cofactor. iipoic actd. which is covalently bound via a amide linkage to 
a lysine group. The E? components of OGCD and BCOACD bind a single lipoyl group, while those of PDC bind either 
so one (in yeast and in Bacillus), two (in mammals), or three (in Azotobacter and in Escherichia coii) iipoyl groups [3]. 
In addition to the E2 components of the three enzymatic complexes described above, a Iipoic acid cofactor is also 
found in the following proteins: 

H-protein of the glycine cleavage system iGCS) [4]. GCS is a multienzyme complex of four protein components 
25 which catalyzes the degradation of glycine. H protein shuttles the methylamine group of glycine from the P protein 

to the T protein H-protein from either prokaryotes or eukaryotes binds a single Iipoic group. 

Mammalian and yeast pyruvate dehydrogenase complexes differ from that of other sources, in that they contain, 

in small amounts, a protein of unknown function - designated protein X or component X. Its sequence is closely 

related to that of E2 subunits and seems to bind a iipoic group [5], 
30 . Fast migrating protein (PMP) (gene acoC) from Alcaligenes eutrophus [6]. 

This protein is most probably a dihydrolipamide acyltransferase involved in acetoin metabolism 

A signature pattern was developed which allows the detection of the lipoyl-binding site. 
[0273] Consensus pattern|GhJ]-Ki;-:HUVFi--Ki5^^ 
35 {5HGCN]~x-[LiVMFY) [K is the lipoyl-binding site] Note the domain around the lipoyl-binding lysine residue is evolu- 
tionary related to that around the biotin-binding lysine residue of biotin requiring enzymes 

[ 1] feaman S J. Biochem. J. 257:625-632(1989). 
[2]YeamanS.J Trends Biochem Sci. 11 -293-296(1 986). 
40 [ 3] Russel G C . Guest J.R. Biochim Biophys Acta 1076-225-232(1991} 

[4] Fujiwara K., Okamura-ikeda K., Motokawa Y. J. Biol. Chem. 281:8838-8841(1986). 

[ 5] Behai R H„ Browning K S , Hall T B.. Reed L.J. Proc Natl Acad. Sci. U.S A, 86 6732-3736(1 989). 

[ 6] Pnefert H., Hein S., Krueger N„ Zeh K„ Schmidt B„ Steinbuechel A. J. Bacterioi. 173:4056-4071(1991). 

■>s [0274] 70 C2 (C2 domain) Number of members: 295 

Some isozymes of protein kinase C (PKC) [1,2] contain a domain, known as C2, of about 118 ammo-acid residues 
which is located between the two copies of the C1 domain (that bind phorbo! esters and diaeyiglycerol) (see 
<PDGC00379>) and the protein kinase catalytic domain (see <PDOC0010Oi. Regions with significant homology 
[3,E1] to the C2-domain have been found in the following proteins: 

PKC iscfcrms alpha, beta and gamma and Drosophila isoforms PKC1 and PKC2. 

PKC isoforms delta, epsilon and eta, Caenorhabditis elegans kin-13 and yeast PKC1 have a C2-!ike domain at 
the N-terminal extremity [4], 

Yeast cAMP dependent protein kinase SCH9 contains a C2-like domain 
55 - Mammalian phosphatidylinositoi-specific phospholipase C iPI-PLC) (see <PDOC50007>) isoforms beta, gamma 
and delta as well as several non-mammalian PI-PLCs have a C2~like domain C-terminal of the catalytic domain. 
Mammalian and plants phosphatidylinositol-3-kinase have a C2-like domain in the central region of the 110 Kd 
catalytic subunit. 
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Yeast phosphatidylserine-decarboxylase 2 (gene PSD2} contains a C2 domain in its centra! region. 
Gytosotic phospholipase D from plants, and cytosolic phospholipase A2 have a C2-ltke domain at their N-terminus 
Synaptotagmins (p65i. This, ts a family of related synaptic vesicle proteins, that bind acidic phospholipids and that 
may have a regulatory rote m the membrane interactions during trafficking of synaptic vesicles at the active zone 
s of the synapse. All isoforms of synaptotagmins have two copies of the C2 domain in their C -terminal region 

Rahphilin-SA, a synaptic protein contains two C2 domains. 

Caenorhabditis elegans protein unc-13 whose function is not known Unc-13 has a C2 domain in its central part 
and a C2-!ike domain at the C -terminus. 

rasGAP and the breakpoint cluster protein bcr have a 02-domarn O-terminal of a PH -domain. 
10 - Yeast protein BUD2 (or CLA2) has a C2-dornain tn the centra! region. 

Yeast protein RSP5 and human protein NEDD-4. both proteins also contain WW domains (see <PDOC50020>). 
Perform (see <PDOC00251>) has a C2 domain at the C-terminus. It is the only extracellular protein known to 
contain a 02 domain. 

Yeast hypothetical protein YML072C has a C2 domain. 
*s - Yeast hypothetical protein YNL087W has three 02 domains. 

Caenorhabditis elegans hypothetical protein F37A4.7 has two C2 domains. 

The C2 domain is thought to be involved in calcium-dependent phospholipid binding [5], Since domains related to 
the 02 domain are also found in proteins that do not bind calcium, other putative functions for the 02 domain like 
e.g. binding to inositol-1,3,4,5--tetraphosphate have been suggested [6j. Recently, the 3D structure of the first 02 
so domain of synaptotagmin has been reported [7], the domain forms an eight-stranded beta sandwich The signature 

pattern that has been developed for the 02 domain is located m a conserved part of that domain, the connecting 
ioop between beta strands 2 and 3. A profile has been developed for the 02 domain that covers the total domain. 

- Consensus pattern: [ACG]-x(2)-L-x(2 1 3)-D-x(1,2HNGSTLiFHGTMR]-x.[STAP3-D-[PAHFY] 

2& - Note: this documentation entry is linked to both a signature pattern and a profile. As the profile is much more 
sensitive than the pattern, you should use it if you have access to the necessary software tools to do so. 

[0275] [1 [Medline 96367095 Extending the 02 domain family: C2s in PKCs delta, epsiion. eta and theia, phosphol- 
tpases, GAPs and perforin Ponting CP, Parker PJ. Protein Sci 1996:5 "162-166 

[ 1] Azzi A,, Boscoboinik D : Hensey 0 Eur. J Biochem 208 547-557(1992). 
[23 Stabe! S. Semin, Cancer Biol. 5:277-284(1994). 

[ 3j Brose N . Hofmann K O , Hata Y„ Suedhof T.C. J Biol. Chem. 270 25273-25280(1995) 
[4] Sossin W.S . Schwartz J.H Trends Biochem Sci. 18-207-208(1 993}. 
35 [ 5] Davletov B.A., Suedhof T.C. J. Biol. Chem. 268:26386-26390(1993). 

[6j Fukuda M„ Aruga J„ Niinobe M.. Aimoto S„ Mikoshiba K. J. Biol. Chem. 289:29206-29211(1994). 
[ 63 Sutton R.B , Davletov B A , Berghuis A M . Suedhof TO , Sprang S R. Cell 80 929-938(1995} 

[0276] 71 . CAP (CAP protein} Number of members. 11 
40 in budding and fission yeasts the CAP protein is a Afunctional protein whose N-terminal domain binds to adenylyl 
cyclase, thereby enabling that enzyme to be activated by upstream regulatory signals, such as Ras The function of 
the C -terminal domain is less clear, but it is required for normal cellular morphology and growth control [1]. CAP is 
conserved in higher eukaryotic organisms where its function is not yet clear [2], 

Structurally, CAP is. a protein of 474 to 551 residues which consist ot two domains separated by a proline-rich hinge 
45 Two signature patterns, one corresponding to a conserved region in the N-termina! extremity and the other to a C- 
tenninal region have been developed. 

- Consensus pattern: [L!VM](2)-x-R-L-[DE3-x(4)-R-L-E 

- Consensus pattern: D-{UVMFYhx-£-x-[PA)-x-P-E-Q-[L!VMFYJ-K 

[ 13 Kawamukai M., Gerst J. t Field J , Riggs M, Rodgers L , Wigler M„ Young D. Moi. Biol Cell 3:167-180(1992) 
[23 Yu 6., Swiston J., Young D. J. Ceil Sci. 107:1871-1678(1994). 

[0277] 72. CAP .GLY (CAP-Giy domain) 
55 CAP stands for cytoskeieton-associated proteins Swiss P39937 may be a member but has not been included. It has 
a weak match to the family between residues 22-67. Number of members 24 

ftJMedline: 93242656 Sequence homologies between four cytoskeieton-associated proteins. Riehemann K, Sorg C 
Trends Biochem Sci 1993;18:82-83. 
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[0278] it has been shown [1] that some cytoskeieton-associated proteins (CAP) share the presence of a conserved, 
glycine-rich domain of about A2 residues, called here CAP-Gly Proteins known to contain this domain are listed below 

Resttn (also known as cytoplasmic linker protein- 170 orCLIP-170). a 160 Kd protein associated with intermediate 
s filaments and that links e-ndocytic vesicles to microtubules. Restin contains two copies of the CAP-Gly domain 

Vertebrate dynactin f 1 50 Kd dy new-associated polypeptide. DAP) and Drosophtla glued, a major component of 
activator I. a SOS polypeptide complex that stimulates dynein-rnediated vesicle transport. 
Yeast protein BIK1 which seems to be required for the formation or stabilization of microtubules during mitosis and 
tor spindle pole body fusion during conjugation. 
10 - Yeast protein NIP 100 (NIP30). 

Human protein CKAP1/TFCB. Schizosaccharomyces pombe protein alplt and Caenorhabditis eiegans hypothet- 
ical protein F53F4.3. These proteins contain a N-terminal ubiqurtin domain (see <PDOC00271>) and a C-terminal 
CAP-Gly domain. 

Caenorhabditis eiegans hypothetical protein M01A8.2. 
*s - Yeast hypothetical protein YNL148c. 

Structurally, these proteins are made of three distinct parts an N-terminal section that is most probably globular and 
contains the CAP-Gly domain, a large central region predicted to be in an alpha-helical coiled-coil conformation and, 
finally, a short C-terminal globular domain. The signature for the CAP-Gly domain corresponds to the first 32 residues 
so of the domain and includes five of the si>: conserved glycines. 

- Consensus pattern: G-x(8,10HFYW]-x-G-[LIVM]-x-[LIVMFY]-x{4hG-K-[NH|-x-G4STAR]-x(2VG-x(2HLY]-F 

[ 1] Riehernann t< , Sorg C Trends Biochern Sci 18 82-S3i 1993) 
25 |0279| 73. (C8D1) 

Cellulose-binding domain, fungal type 

The microbial degradation of cellulose and xylans requires several types of enzymes such as. endoglucanas.es. (EC 
3.2.1.4). cellobiohydrolases (EC 3.2.1 91 ) (exoglucanases). orxyianases (EC 3.2.1.8} 

[0280] Structurally, celiulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
30 domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[0281] The CBD of a number of fungal celiulases has been shown to consist of 36 amino acid residues. Enzymes 
known to contain such a domain are: 



E: ndogluoanase I {gene eoj1 } from Trichoderma reesei. 
Endogiucanase II (gene egl2) from Trichodemia reesei. 
Endogiucanase V (gene eg!5) from Trichoderma reesei. 

E>ocellobiohydiolase I (gene CBHIj from Humicola gnsea, Neuraspoia crassa. Phanefochaete chrysosporiurr 
Trichoderma ree<ei. and Trichoderma vinde. 
Exocellobiohydrolase II (gene CBHII) from Trichoderma reesei. 
E^ocellobiohydrolase 3 (gene ee!3> from Agaricus bisporus 
Endoglucanases B. C2. F and K from Fusarium oxysporurn 

[0282] The CBD domain is found either at the N-terminal (Cbh-ll or egl2) or at the C-terminal extremity (Cbh-I, eg!1 
or egl 5} of these enzymes. As it is shown in the following schematic representation, there are four conserved cysteines 
in this type of CBD domain, all involved in disulfide bonds. 



I M I 

xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 



'C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern, 

[0283] Such a domain has also been found in a putative polysaccharide binding protein from the red a 
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purpurea [2] Structurally, this protein consists of four tandem repeats of the CBD domain 

[0284] Consensus pattemC-G-G-x(4,7)-G-x(3>-C-x(5)-C-x(3 1 5HNHG]-x-[FYWM]- x(2VQ-C [The four C's are in- 
volved tn disulfide bonds.] Sequences known to belong to this class detected by the pattern ALL. 

s [1] Giikes N.R., Henrissat B.. Kiiburn D.G., Miiier R.C. Jr., Warren R.A.J. Microbioi. Rev. 55:303-315(1991). 

[2] Liu Q„ der Meer J, P., Reith M E. 

[0285] 7 A CBS domain 3D Structure found as a subdomam in TIM barrel of inosine-. CBS domain web page CBS 
domains are small intracellular modules, mostly found in 2 or four copies within a protein CBS domains are found in 
10 cystathtonine-beta-synthase (CBS) where mutations lead to homocystinuna. Two CBS domains art found in iriosine- 
monophosphate dehydrogenase from all species, however the CBS domains are not needed for activity. Two CBS 
domains are found in intracellular loops of several chloride channels. Mutations in this domain of Swiss:P35520 lead 
to honrtocystinuna. Number of members: 414 

'5 [1]Medline: 97172695 The structure of a domain common to archaebacteria and the homocystinuna disease pro- 

tein. Batsman A: Trends Biochem Sci 1997;22:12-13. 

[2]Medline: 962/9838 Structure and mechanism of inosine monophosphate dehydrogenase in complex with the 
immunosuppressant mycophenolic-acid Sintchak MO, Fleming MA, FuterO, Raybuck SA, Chambers SP, Caron 
PR, Murcko MA, Wilson KP; Ceil 1996:85 921-930. 
so Discovery of CBS domain. 

[3]Medline 97259972 CBS domains in CIC chloride channels implicated in myotonia and nephrolithiasis (kidney 
stones). Ponting CP; J Mot Med 1997;75:160-163, 

[0286] 75 CDP-OHJMrarisf (CDP-aicohol phosphattdyltransferase) 
25 All of these members have the ability to catalyze the displacement of CMP from a CDP-aleohol by a second alcohol 
with formation of a phosphodiester bond and concomitant breaking of a phosphatide anhydride bond Number of mem- 
bers: 32 

A number of phosphatidyitransferases, which are all involved in phospholipid biosynthesis and that share the property 
of catalyzing the displacement of CMP from a CDP-aicohol by a second alcohol with formation of a phosphodiester 
30 bond and concomitant breaking of a phosphoride anhydride bond share a conserved sequence region [1.2] These 
enzymes are: 

Ethanolamtnephosphotransferase {EC 2.7.8.1 ) from yeast (gene EPT1 ). 
Diac.ylglycerol oholinephcsphotransferase {EC 2 7 8 2) from yeast (gene CP'f'1 ). 
35 - Phosphatidylglycerophosphate synthase (EC 2.7.8.5) (CDP-diacyiglycero!--giyceroi-3-phosphate 3-phosphatidyl- 
transferase) from bacteria (gene pgsA). 

Phosphatidylserine synthase (EC 2 7 8 8} (CDP-diacylgiycerol-senne O-phosphatidyitransferase; from yeast 
(gene CH01) and from Bacillus subtiiis (gene pssA). 

Phosphatidylinositol synthase (EC 2.7.8.11) {CDP-diacylglycerol-inosttol 3-phosphatidyltransferase) from yeast 
40 {gene PIS). 

These enzymes are proteins of from 200 to 400 amino acid residues. The conserved region contains three aspartic 
acid residues and is located in the N-termma! section of the sequences, 

45 - Consensus pattern: D~G-x{2)-A-R-x{8)-G-x(3)-D-x{3)-0 

[1 {Medline: 97075020 Two-dimensional 1H-NMR of transmembrane peptides from Escherichia coli phosphatidylglyc- 
erophosphate synthase in micelles Moretn S. Trouard TP, Hauksson JB. Rilfors L, Arvidsori G, Lindblom G: Eur J 
Biochem 1896;241:489-497. 

[ 1] Nikawa J.--I., Kodak! T . Yameshite S. 
J, Biol. Chem. 262:4878-4881(1987). 
[23 Hjelmstad R.H., Bel! R.M. 
J. Biol, Chem. 266:5094-6134(1991). 

ss 

[0287] 76 CHOO (Cholesterol oxidase) Members of the GMC oxidoreductase family Number of members' 3 
[0288] fiJMedline: 94032271. Crystal structure of cholesterol oxidase completed with a steroid substrate: implica- 
tions for flavin adenine dinuoieotide dependent alcohol oxidases Lt J, Vrielink A. Brick P, Blow DM, Biochemistry 1993, 
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32:11507-11515. 

[0289] The following FAD flavoproteins oyidoreductases have been found [1.2] to be evolutionary related. These 
enzymes, which aie called 'GMC oxidoieductases'. are listed below 

s . Glucose- oxidase- (EC 1.1.3.4) (GOX) from Aspergillus niger Reaction catalyzed, glucose + oxygen delta-lu- 
conolactone + hydrogen peroxide. 

Methanol oxidase (EC 1.1.3.1.3) (MOX) from fungi Reaction catalyzed: methanol + oxygen ac.etaldehyde + 
hydrogen peroxide. 

Choline dehydrogenase (EC 1.1.99 1) (CHD)from bacteria. Reaction catalyzed: choline ■> unknown acceptor ■> 
to beta trie acetaldehyde + reduced acceptor. 

- Glucose dehydrogenase (GLD) (EC 1.1.98 10) from Df osophila. Reaction catalyzed: glucose + unknown acceptor 
-> delta-gluconolactone + reduced acceptor. 

Cholesterol oxidase (CHOD.i (EC 1 1 3 6) from Brevihaoterium steroltc.um and Streptcmyces strain SA-COO. Re- 
action catalyzed cholesterol + oxygen cholest-4-en-3-one + hydrogen peroxide 
*s - AlkJ {3]. an alcohol dehydrogenase from Pseudomonas oleovorans, which converts aliphatic medium-chain-length 
alcohols into aldehydes. This family also includes a lyase: 

i'RY-mandelonitrile lyase <C--C 4. 1.2. 10} (hydroxynitrile lyase) from plants [4[. an enzyme involved m oyanogenis. 
the release of hydrogen cyanide from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) 
to 664 (MOXi amino acid residues which share a number of regions of sequence similarities. One of these regions, 
so located in the N-termmal section, corresponds to the FAD ADP- binding domain. The function of the other conserved 

domains is not yet known; two of these domains have been selected as signature patterns. The first one is located 
in the N-terminal section of these enzymes, about 50 residues after the ADP~binding domain, while the second 
one is located in the central section. 

25 - Consensus pattern: [GAHRKNj-x-ELiVj-GfaHGSTKaj-x^LIVM^N-xfSHFYWAJ-xfaHPAGj^SHDNESH] 

- Consensus pattern: [GS]"[PSTA}-x(2)"[ST3-P~x~[LIVM](2}-x(2)"S-G"[LIVM3-G 

[IjCavenerQ.R J Mol. Biol 223:811-614(1992). 
[2] Hemkoff S„ Henlkoff J.G. Genomics 19:97-107(1994). 
30 [ 3] van Beilen J & , Eggink G . Enequist H., Bos R , Witholt B Mol Microbiol &■ 3121 -31 36(1 992) 

[4] Cheng I.R, Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0290] 77. CKS (Cyclin-dependent kinase regulatory subunit) Number of members: 11. Cycim-dependent kinases 
(CE)K) are protein kinases which associate with cyc.iins to regulate eukaryotic cell cycle progression. The most well 

35 known CDK is p34-cdc2 (CDC28 in yeast) which is required for entry into S-phase and mitosis. CDK's bind to a regu- 
latory subunit which Is essentia! for their biological function. This regulatory subunit is a small protein of 79 to 150 
residues in yeast (gene CKS1) and in fission yeast (gene sucl) a single tsofoirn is known, while mammals have two 
highly related isoforms. It has been shown [1] that these CDK regulatory subuntts assemble as an hexamer which then 
acts as a hub for the oiigomerization of six CDK catalytic subunits The sequence of CDK regulatory subunits are highly 

40 conserved therefore, the two most conserved regions have been used as signature patterns. 

- Consensus pattern: Y-&-x-(KR3-> / -x-[DEK2)-x-EFV3-E-> / -R-H-V-x-[LV3-iPT]-[KRP] 

- Consensus pattern: H-x-P-E-x-H-IlVj-t.-L-F-EKR] 

45 [0291] 1 1] Patge H.E.. Arvat A.S.. Murtan D.J.. Reed ST., Tainer J.A. Science 262:387-395(1993). 
[0292] 78 CK Jl_beta (Casein kinase II regulatory subunit) 

Number of members: 16. Casein kinase II (CK-2) [13 is an ubiquitous eukaryotic serine/threonine protein kinase which 
is found both in the cytoplasm and the nucleus and whose substrates are numerous It generally phosphorylates Ser 
or Thr at the N-terminai of stretch of acidic residues (see <PDOC00006>) CK-2 exists as an heterotetramer composed 
so of two catalytic subunits (alpha) and two regulatory subunits (beta) In most species there are two closely related 
isoforms of the catalytic subunit' alpha and alpha' Some species, such as fungi and plants, express two forms of 
legulatoty subunits beta and beta' The exact function of the regulatory subunit is not yet known It is a highly conserved 
protein of about 25 Kd that contains, in its central section, a cysteine-rich motif that could be involved in binding a metal 
such as mnc [2], This region has been used as a signature pattern. 

ss 

- Consensus pattern: C-P-x-[LIVMY]-x-C-x{5HLI3-P-[LIVMC3-G-x(9)-V-[KR3-x(2)-C-P-x-C 
[ 1] Allende J.E., Allende C.C. FASEB J. 9:313-323(1995). 



57 



EP 1 033 405 A2 



[ 2j Reed J.C . Bidwai A. P., Glover C V C J Biol. Chern 269.18192-18200(1994}. 
[0293] 79 CLP_protease {Clp protease) 

These proteins belong to family S14 in the classification of peptidases 

s 

'■ The Clp protease has an active site catalytic triad. In E coli Clp protease, ser-111. hts-136 and asp- 185 form 
the catalytic triad. 

1- Swiss: P48254 has lost all of these active site residues and is therefore inactive. 

!■■ Swiss: P423 79 contains two large insertions, Swiss:P42380 contains one large insertion. Number of members 1 38 

10 

[0294] The endopeptidase Clp (EC 3.4.2 1 .92 ) from Escherichia coli cleaves peptides in various proteins in a process 
that requires ATP hydrolysis [1,2]. Clp is a dimeric protein which consists of a proteolytic subunit(gene clpP)and either 
of two related ATP-binding regulatory subunits (genes olpA and dpX) CIpP is a serine protease which has a chyme- 
try ps in-like activity. Its catalytic activity seems to be provided by a charge relay system similar to that of the trypsin 
*s family of serine proteases, but which evolved by independent convergent evolution. Proteases highly similar to CipP 
have been found to be encoded in the genome of the chloropiast of plants and seem to be also present in other 
eukaryotes. The sequences around two of the residues involved in the catalytic triad (a serine and a histidine) are 
highly conserved and can be used as signature patterns specific to that category of proteases. 

20 - Consensus pattern: T-x(2HL!VMF]-G-x-A-[SAC]-S-[MSA]-[PAGHSTA] [S is the active site residue] 

- Consensus pattern: R-xf3HEAP]-x(3HL!VMFYT]-M-[LIVM]-H-Q-P [H is the active site residue] 

[1]Medline: 98050920. The structure of CIpP at 2.3 angstroms resolution suggests a model for ATP-dependent 
proteolysis Wang J. Hartling JA, Flanagan JM: Cell 1997;91 447-456. 
25 [ 1] Maurizi MR., Clark W.P., Kim S.-H., Gottesman S. J. Biol, Chem. 285:12546-12552(1990). 

{'>} Gottesman S.. Maurizi M R. Microbiol. Rev 56592-621(1992). 
[ 3] Rawlings N.D.. Barrett A.J Meth Enzymol 244:19-61(1994). 

[0295] 80 CNGjrtembrane {Transmembrane region cyclic Nucleotide Gated Channel') 
30 [1 jMedlrne: 94224763. Cyclic nucieotide-gated channels: an expanding new family of ion channels. Yau KW; Proc Natl 
Acad Sci USA 1994:91:3481-3483. 

This family is found to the N-termirtus of the cNMP^binding. Number of members: 56. Proteins that bind cyclic nucle- 
otides (cAMP or cGMP) share a structural domain of about 120 residues [1-3]. The best studied of these proteins is 
the prokaryotic cataboltte gene activator (also known as the cAMP receptor protein} (gene crp) where such a domain 
35 ts known to be composed of three alpha-helices and a distinctive eight-stranded, antipsirallei beta-barrel structure. 
Such a domain is known to exist in the following proteins: 

Prokaryotic catabolite gene activator protein (CAP). 

cAMP-and cGMP-dependent protein kinases (cAPK and cGPKV Both types of kinases contains two tandem copies 
40 t f the <.\clic nuci^otidH-Lindtng domain The <. APK s 3i*> composed ottv,o diitei^nt subunits a catalytic chain and 

: i regulatory chain which contains both copies of the domain Trn» c^Ph's are single i-ham er^ymes ihat include 
tht- t i vo copies of the oorruin in thnt N-twminal section The nu.ieotide specific itv of cAPh and cGPK is ou^ to 
an ammo acid in the <.ons<?ived region of beta barrel / a threonine that is invaitant in cGPK ts an alanine in most 
cAPK. 

45 - Vertebrate ^, ok nucleotide -gated ion-ohannuK Tv^o such cations ch mneis havu bten fnlU character lt-k) One 
is found in rod cells u\here it plays a role in visual signal transduction It specifically binds to cGMP leaoing to an 
opening of the channel and thereby causing a depolarization of rod photoreceptors, in olfactory epithelium a similar. 
cAMP-bindinq. channel plays a role in odorant siqnal transduction. There are six invariant amino acids in this 
domain, three of which are glycine residues that are thought to be essentia! for maintenance of the structural 

so inWgntyotthe beta-bat tt4 Tvu siynatui* 1 j. attains have txsen d^v^lop^d toi this domain Th<= fit st pattern ts located 

vwthtn beta barrels and 3 and contains the first two -xnserved Glv The second pattern is located within beta 
barrels 6 and 7 and contains the third conserved Glv as well as the three other invariant residues. 

- Consensus pattern: [L!VM]-[ViC]-x(2VG-[DENQTA3~X"EGAC3-x{2)-[LIVMFY]{4)-x{2)-G 

ss - Consensus pattern: [L!VMF]-G-E-x-[GASHLIVM}-xf5.11)-R-[STAQ]-A-x-|L!VMA]-x-j:STACV3 

[ 1] Weber IT.. Shabb J.8., Corbin J.D. Biochemistry 28:6122-6127(1989). 
[2]KauppHB Trends Neuiosci 14 1 50- 1 57. 1 991 1 
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[ Jj Snabb J 8 Cotbm J D J Biol Chem 207 T23~571^ i3tO 
[0296] 81 lO y 10_ctaB_>-yoE (V yttx hromt- c o<idase assembly T.Kforl 

m 

Medline: 95191390 

Bios\ nthesis and funt tion.il role of haetrt O .jrtd hapm A 

Mogi T. Saiki K. Anraku Y: 
Moi Microbiol 1-J04 14 3t>1-39t? 

Cytochrome c oxidase ts a multi subunit enzyme. The complexity of this ensvme requires assistance in building the 
cotnpiex. 

This is rame-d out by the toon r unit* c oxidase assembly lactui 
Number of members: 31 

[0297] Cytoihiome c o*id.tse ii an ohqomenc pn.TV mafic complo. which ieems to requite the .jid of a number of 
ptoteins that either a:t us char.eronins to nelp the subunit? ?f th^ t-nzyrm* to fold correctly, or assist in the assembly 
of the metal centers [1] One of these subunits is knov.n as COX10 in yeast and as ctaS [2] in aerobic prokaryotes. It 
is evolutionary related to cyoE protein from the Escherichia coli cytochrome O terminal oxidase complex. 
[0298] 'Ihese piotems pi ob^hlv contain (3] seven tiansmembrjne segment The moit conseh'ed region is located 
in a loop between the second and thud of these segments and has been selected as a signature pattern 

- Consensus pattern: [ED]-x-Q-x{2)-M-x-R-T-x(2)-R-x(4)-G 

[ 1 3 Nobrtga M P NobtegaFG TzagoloffA 

J. Biol. Chem. 265:14220-14226(1990). 
[2]CaoJ Hosier J ShaplmghJ R^v^in A FeEcji^on-Miikr S 

J. Biol, Chem. 267:24273-24278(1 992). 
[3jChepuri V„ GenmsR.B. 

J Biol '.hem /<3f. 1297*3- 12986(1 990 1 

[0299] 62 COX3 (Cytochrome c oxidase suoumt lilt 
I his family corresponds to chains c and p. 
[1] 

Medline: 96216288 
The o'hol* 1 strtickne of th*> 1C-bUbuntt oidced cvtochanrte c 
oxidase at 2.« A. 
Tsukihara T. Aoyama H, Yarnashita E. Tornisaki T, Yarnagucht H. 
&hmzawa~ltoh K. Nakashima R, Yaono R. Yoshikavva S: 
Science 1 996:272: 1 1 38-1 1 44. 
Number of members: ^24 

[0300] 83 COX5B (Cytochrome c oxidase subunit \ b) 

m 

Medline: 96216288 

Tne wnole struct" Ere of the 13-subunrt o<.idizeo cytochrome c oyidas^ at 2 % A 

Tsukihara T. Aovama H. Yarnashita E. Tomizaki T Yamaguchi H. Shmzawa-ltoh K. Nakashima R. Vaono R, Yoshikawa 

S; 

^iencp 19CWV272- 1136-1144 
This family consists of chains F and S 
Number of members: 10 

[0301] 0)focriroint; o otidase (EC ! ^ H [1] is an oligcmerK enzymatic complex which is a component of the 
respiratory chain complex ana is invoiced in the transfer of electrons from cytochrome c to oxygen In eukaryotes this 
enzyme complex is located in the mitochondrial inner membrane: in aerobic prokaryotes it is found in the plasma 
memhtan? In additkn k the thr^p iaroe subunits that form the catahtk center of the enzyme comple< theie are in 
eukaiyotes a enable numbet of small pHypeptidic subunits One of these subunits which is known as Vb in mammals. 
V in slime mold and IV in yeast, binds a zinc atom. The sequence of subunit Vb is well conserved and includes three 
consei^ed cysteine*, that are thought to coordinate the mnc ion [.'I f*o of these cysteines are clustered in the C- 
terminal action ot tht subunit this mqion has been self ted as a signature pattern 

- Consensus pattern: [LlVMK2)-[FYW]-x(10)-C-x{2)-C-G-x{2)-[FY]-K-L [The two C's probably bind zinc] 
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[JjCapaldiRA Maiatebta F Dar!e\-Usmai \t M Biochim Biophys Acta "26 135- I46( 1933^ 

[2] Rizzuto R., Sandona D., Brim M.. Capaldt R.A, Brsson R. Biochim. Biophys. Acta 1129:100-104(1991). 

[0302] 84 COestwase iCatbo>ylesterase5f 
s Cnolinesterase pages 

The prmts entry is specific to acetylcholinesterase 
Number of members: 273 

[0303] Higher eukaryotes have many distinct esterases Among the different types are those which act on carboy lie 
esreis (¥:C 3 11-1 Carboy I esterases have been classified into tluee categories (A B and Gi on the ba^-is of differ 
to cnftal patterns of mhibilion t> organophosphatf^ The i^queru'e ot a numbet of fype-B carbonize uses indicates 
[1.2. 3] that the majority are evolutionary related. This family currently consists of the following proteins: 

Acetykht lmPi.ter.tSfr (FC 3 1 1 ") {AC hF ) [E t] fiom vertebrates and fiom Dtoiophila 

Mummaltan ^holinesteia^e H ibutyiyl rhHinesterase' (EC 3 1 1 8f Acetylcholinesterase and :holinesterase II aie 
*s close!) relateo enzymes that hydro!>ze choline esters [4] 

Mammalian liver microsomal carboxylesterases (EC 3.111). 

Dioiophila esterase 0, produced m the .tntenoi eiacul.jfoiv duct of the male iniPit repi ■xfui-ti" e svstem wheiP it 
plays an important role tn its reproductive biology. 
Drosophila esterase P 
so - Culex pipiens (mosquito) esterases B1 and B2. 

Myzus perstcae (peach-potato aphid) esterases E4 and FE4 

Mammalian bile-sa it-activated lipase J SAL) [£). a multifunctional lipase which catalyzes fat and vitamin absorption. 
It is activated by bile salts in infant intestine where it helps to digest milk fats, 

- Insect juvenile hormone esterase iJH esterase) (EC 3. 1 .1 .59). 

25 - Lipases (EC 3.1.1.3) from the fungi Geotnchum candidum and Candida rugosa. 
Caenorhabdttis gut esterase (gene ges-1 ). 

Duck fatty acyl-CoA hydrolase, medium chain (EC 3.1 .2.14). an enzyme that may be associated with peroxisome 
proliferation and may play a role tn the production of 3-hydroyy fatty actd diester pherornones 
Membrane enclosed crystal proteins from slime mold. These proteins are, most probably esterases: the vesicles 
30 where they are found have therefore been termed esterosomes. 

[03043 So far too bacterial proteins have been found to belong to this family 

Phenmedipham hydrolase (phenylcarbamate hydrolase), an Aithrcbacter oxidans plasmid-encoded enzyme (gene 
35 ped) that degrades the phenylcarbamate herbicides phenmedipham and desrnedtpharn by hydtoiyzing their central 

carbamate linkages. 

Para-nitrohenzyl esterase from Bacillus subtiiis (gene pnbA). 

[0305] The following proteins, while having lost their catalytic activity, contain a domain evolutionary related to that 
40 of carboxylesterases type-B: 

Thyroglobulin (TG), a glycoprotein specific to the thyroid gland, which is the precursor of the iodinated thyroid 
hormones thyroxine (T4) and triiodo thyronine (T3). 

Drosophila protein neuractin (gene nrt) which may mediate or modulate cell adhesion between embryonic cells 
45 during development, 

Drosophila protein giutactin (gene git), whose function is not known. 

[0306] As is the case for lipases and serine proteases, the catalytic apparatus of esterases involves three residues 
(catalytic triad): a serine, a giutamate or aspartate and a histidine. The sequence around the active site serine is well 
so conserved and can be used as a signature pattern A conserved region located in the N-termmal section containing a 
cysteine involved in a disulfide bond has been selected as a second signature pattern 

- Consensus pattern; F-JGR)-G-x(4HLIVM3-)«-[UV]-x-G-x-S-{STAG]-G{S is the active site residue] 

- Consensus pattern: [ED]-D-C-L-[YT}-[LIV]-[DNS]--[L!V3-[LIVFYW]-x-[PQR] [C is involved in a disulfide bond] 

ss 

[ 1] Myers M.< Richmond R.C.. Oakeshott J.G. Mo!. Biol. Evoi. 5:113-119(1988). 

[ 23 Krejci E., Duval N, C baton net A.. Vincens P., Massoulie J. Proc. Natl. Acad. Sci. U.S.A. 88:6647-6651(1991). 
[ 3] Cygler M . Schrag J D . Sussrnan J L . Hare! M . Silman I. Gentry M K . Doctor B P Protein Sci 2:366-382 
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{1993}. 

[ 4] Lockndge O BioEssays 9 125-128(1988} 

[ 5] Wang C -S , Hartauek J.A Biochirn. Biophys Acta 1166 1-19(1993). 

s [0307j 85. CPSase_l_chain (Carbamovi-phosphate synthase (CPSase}} 
[1] 

Medline: 94347758 

Three-dimensional structure of the biotin carboxylase subunit of acetyl~CoA carboxylase. 
Waldrop GL, Rayment I, Holden HM, 
10 Biochemistry 1994:33.: 10249-1 0256. 

El] 

Medline: 90285162 

Mammalian carbamyl phosphate synthetase {CPS) DMA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD, 
is Simmer J P. Kelly RE, Rinker AG Jr, Sculiy JL Evans OR: 
Bioi Chem 1990.265:10395-10402. 
Carbamoyi-phosphate synthase catalyzes the A'TP-dependent synthesis of carbamyl-phosphate from glutamine or 
ammonia and bicarbonate This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/ 
or pyrimidines [2]. 

so The carbamoyi-phosphate synthase (CPS) enzyme- in prokaryotes is. a heterodirnet of a small and large chain. The 
small chain promotes the hydrolysis of giutamine to ammonia, which is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_sm_chain. 

The small chain has a GATase domain in the carboxyl terminus 
See GATase. 

25 Number of members: 181 

[0308] Carbamoyi-phosphate synthase (CPSase) catalyses the ATP- dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6 3 5 5} or ammonia (EC 6.3.4.16) and bicarbonate [1] This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimidines. 

[0309] Glutamme-dependent CPSase (CPSase II } is involved in the biosynthesis of pyrimidines and purines in bac~ 
30 u-na such as Escherichia colt, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes The bacterial enzymes are formed of two subumts A small chain (gene carA) that provides 
glutamine amidotransferase activity (GATase) necessary for removal of the ammonia group from giutamine, and a 
large chain {gene carB) that provides CPSase activity. Such a structure is also present in fungi for arginine biosynthesis 
(genes CPA1 and CPA2} In most eukaryotes. the first three steps of pyrimidine biosynthesis are catalyzed by a large 
35 multifunctional enzyme - called URA2 in yeast rudimentary in Dfosophila and CAD in mammals [2]. The CPSase 
domain is located between an N-terminal GATase domain and the C -terminal part which encompass the dihydroorotase 
and aspartate transearbamylase activities. 

[0310] Ammonia-dependent CPSase (CPSase I} is involved in the urea cycle in ureolytic vertebrates, it is a mono- 
functionai protein located in the mitochondrial matrix. 
40 [0311] The CPSase domain is typically 120 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 amino acids Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

[0312] The CPSase subdomain is also present in a single copy in the biottn-dependent enzymes acetyl -CoA car- 
■>s boxyiase (EC 6 4 12) (ACC), propionyl-CoA carboxylase (EC 6413) (PCCasei, pyruvate carboxylase (EC 6411) 
(PC) and urea carboxylase (EC 6.3.4.6). 

[0313] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

so - Consensus pattern: [FYVHPS]-[l-lVMC]-[L!VMAHLIVM]-[KRHPSA3-[STA]-x(3HSG3-G-x-[AG] 
- Consensus pattern: [LlVMFHUMN]-E-[L!VMCA3-N-[PATLlVMHKRHLIVMSTAC3 

[ 13 Simmer J.P., Kelly R.E., Rinker A.G. Jr., Scully J. L, Evans D.R. 
J. Biol, Chem. 265: 10395 -1tM02(1 990). 
ss [ 2] Davidson J.N,, Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. 

BioEssays 15:157-184(1993). 

[031 4] 86. CPSase_srn_chain {Carbamoyi-phosphate synthase small chain, CPSase domain) 
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[1] 

Medline: 90285162 

Mammalian caibamyl phosphate synthetase (CPS) DMA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 
s Simmer JR Kelly RE, Rinker AG Jr, Scully Jl, Evans DR; 
Biol Chem 1690;265 10395-10402. 
The earbamoyl-phosphate synthase: domain is in the amino terminus of protein 
Carbamoyi-phosphate synthase catalyzes the ATP-dependent synthesis of carbamyl-phcsphate from giutamtne or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/ 
to or pyrimidines [1], 

The cafbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large; chain. The: 
small chain promotes the hydrolysis of giutamme to ammonia, which is used by the large chain to synthesize carbamoyl 
phosphate. See CPS3se_L_chain. 

The small chain has a GATase domain in the carboxy! terminus. 
is See GATase. 

Number of members: 46 

[0315] Carbamoyl- phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6 3 5 5) or ammonia (EC 6.3.4 16} and bicarbonate [1] This important enzyme initiates both the 
urea cycle and the biosynthesis of argmine and pyrimidines. 

so [0316] Glutamine-de-pendent CPSase (CPSase 11} is involved in the biosynthesis of pyrimidines and purines. In bac- 
teria such as Escherichia coif, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes The bacterial enzymes are formed of two subumts. A small chain (gene carA) that provides 
glutamine amidotransferase activity (GATase) necessary fot removal of the ammonia group from giutamtne. and a 
large chain (gene; carB ) that provides CPSase activity Such a structure is also present in fungi for afginine biosynthesis 

25 (genes CRA1 and CPA2) In most eukaryotes. the first three steps of pyrimidine biosynthesis are catalyzed by a large 
multifunctional enzyme ■■ called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2]. The CPSase 
domain is located between an N-terminal GATase domain and the C-terminal part which encompass the dihydroorotase 
and aspartate iranscarbamylase activities. 

[0317] Ammonia-dependent CPSase {CPSase it is involved m the urea cycle in ureolytic vertebrates; it is a mono- 

30 functional protein located in the mitochondrial matrix. 

[0318] The CPSase domain is typically 120 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 amino acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

35 [0319] The CPSase subdomain is also present m a single copy in the biotin-dependent enzymes acetyi-CoA car- 
boxylase (EC 6,4,1,2) (ACC), propionyl-CoA carboxylase (EC 8.4.1.3) (PCCase), pyruvate carboxylase (EC 8.4.1.1) 
(PC) and urea carboxylase {EC 6.3.4.6). 

[0320] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

- Consensus pattern: [FYVj-fPS]- [LIVMC]-[LIVMAHLIVMHKRj-[PSA]-[STA]-x(3}-[SG]-G-x-[AG] 

- Consensus pattern: [LlVMFHL!MN]-E-[LIVMCA3-N-[PATLlVMHKRHLIVMSTAC] 

[ 1] Simmer J.P, Kelly R.E., Rinker A, G, Jr., Scully J. L, Evans D.R. J. Biol, Chem. 265:10395-10402(1990). 
4S [ 2] Davidson J.N., Chen K.C : Jamison R.S , Musrnanno L A., Kern C.B BioEssays 15:157-164(199.3) 

[0321] 87. CARL. TRIO (C RAI./f'R ID domain) 
[1] 

Medlme 96121119 

so Crystal structure of the Saeoharomyces cerevisiae phosphatidylinositol-transfer protein. 
Sha B : Phillips SE. Bankaitis VA, Luc M. 
Nature 1996.391 506-510 

The original profile has been extended to include the carboxy! domain from the known structure of Sec14, Swiss: 
P10911 has not been included in the Pfam family because it does not appearto contain a complete structural domain, 
ss Number of members: 39 

[0322] 88 CSD ('Coid-shock'DNA-bmding domain) 
E1] 

Medline 94255482 
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Crystal structure of CspA. the major coid shock protein of Escherichia coli, 
Schindelin H, Jiang W, inouye M, Heinemann U; 
Proc Mat! Acad Sci USA 1994;91 5119-5123 
Number of members: 121 

s [0323j A conserved domain of about 70 amino acids has been found in prokaryotic and eukaryotic DNA-binding 
proteins [1 ,2,3,E1], This domain, which ts known as the 'cold-shock donna in'(CSD) is present in the proteins listed below. 

Escherichia coll protein CS7.4 (gene cspA) which ts induced in response to Sow temperature (cold-shock protein) 
and which binds to and stimulates the transcription of the CCAAT-containing promoters of the HN-S protein and 
to of gyrA. 

Mammalian Y box binding protein 1 (YB1 ). A protein that binds to the CCAAT-containing Y box of mammalian HLA 
class II genes. 

Xenopus Y box binding proteins -1 and -2 (Y1 and Y2) Proteins that bind to the CCAAT-containing Y box of 
Xenopus hsp70 genes, 

»5 - Xenopus B box binding protein (YB3). YB3 binds the B box promoter element of genes transcribed by RNA poiymer- 

Enhancer factor i suhunit A (D'-'l-A) (dbpB) A protein that also bind to CCAAT-motif in various gene promoters. 
DbpA, a Human DNA-bindirtg protein of unknown specificity. 
Bacillus subtifis cold-shock proteins cspB and cspC. 
20 . streptomyees clavuligerus protein SC 7.0. 

Escherichia coil proteins cspB, cspC. cspD. cspE and cspF. 

Unr, a mammalian gene encoded upstream of the N~ras gene Unr contains nine repeats that are similar to the 
CSD domain. The function of Unr is not yet known but it could be a multivalent DMA-binding protein. 

25 [03243 As a signature pattern for the CSD domain, its most conserved region which is located in its N-terminal section 
has been selected, it must be noted that the 

beginning of this region is highly similar [4] to the RNP-1 RNA-bmding motif, 

- Consensus pattern: [FY]-G-F-l-x(8JHDERHLiVM]-F~x~H~x~[STKR]-x-[LIVMFY3 

[ 1j Doniger J , Landsman D., Gonda MA, Wistow G. 

New Bio!. 4:389-395(1992). 
[ 2j Wistow G. 

Natu re 344: 823-824( 1 990 ). 
35 [ 31 Jones P.G., Inouye M. 

Mol. Microbiol, 11:811-818(1994). 
[ 43 Landsman D. 

Nucleic Acids Res. 20.2861-2864(1992) 

40 [0325] 89. CTF_NFI (CTF/NF-I family) 
Number of members: 45 

[0326] Nuclear factor I (NF-I) or CCA AT box-binding transcription factor (CTF) (1 .2] (also known as TGGCA-binding 
proteins) are a family of vertebrate nuclear proteins which recognize and bind, as dimers, the palindromic DMA se- 
quence 5'-TGGCANhJNTGCCA-3\ C'f F/Nf-'-i binding sites are present in viral and cellular promoters and in the origin 

45 of DNA replication of Adenovirus type 2. 

[0327] The CTF/NF-i proteins were first identified as nuclear factor I. a collection of proteins that activate the repli- 
cation of several Adenovirus serotypes (together with NF-ll and NF-lll) [3], The family of proteins was also identified 
as the CTF transcription factors, before the NFI and CTF families weie found to be identical [43. The CTF/NF-I proteins 
are individually capable of activating transcription and DNA replication. The CTF/NF-I family name has also been 

so dubbed as NFI, NF-i or NF1 . 

[0328] In a given species, there are a large number of different C'f F/NF--I proteins The multiplicity of CTf-7NF-l is 
known to be generated both by alternative splicing and by the occurrence of four different genes. The known forms of 
NF-i genes have been classified as: 

ss - The CTF-like factors subfamily (prototype form: CTF-1 ) [4] 
The NFi-X proteins. 
The NFI-A pioteins. 
The NFI-ES pioteins 
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[0329] 'So far ali CTF/NF-l family membetb appear to have similar transcription ana replication activities. 
[0330] CTF/NF-l pioteins contains 400 to oOO e-inmo audi. The N-tfimmal 200 =imino-<5ud s<=. itence. almost per- 
f« ofly i-on&i : 'rved tn till species and g« nes sequenced mediates sit^-spe otfic DNA iei.ognition protein dimerization and 
Adenovirus DNA replication. The C-termmal 100 amino acids contain the transcriptional activation domain. This acti- 
vation domain is the- target of gene expression regulatory patn^aysellicttea by growth factors and it interacts with basa! 
transcription factors and with histone H3 j8], 

[0331] A pftf-Kilvoonser^o highly ch irgto 12 msidii- 1 p* ptidu k»r*t>o in tht; N-t* tminai p irl of CTF/NF-i has been 
selected as a specific signature for this family of proteins. 

- < ensensus paltern R-h-R-K->'-F-K-h-H-E-K-R 

[1j Mermod N.. O'Neill E.A.. Keliv TJ., Tjian R. 

Cell 58:741 ■■753(1 989V 
[ 2] Pupp R A W hrtise U Multnaur. G Goebe! U Beyieuther K 

Sippe! A.E. 

N ucleic A cids Res 1 c 2*0 'i -26 1 ( ( 1 90u) 
[ 3] N.jcjata h Guctcienheinipi R A Enomok ]" Uhv IH Hurwit? J 

Proc Natl Acad Set USA 79 G4.r6-c442t 19821 
[ 4] Santoro C Meimod N Andiews PC Tjian R 

Nature 334:2118-2224(1988'). 
[ 5] G I G Smith f R G<41^ n J L , Slaughter C A , Orth K Brown M S Csboine T F 

Proc Natl Acao Sci U S A 35 6y6c-e9e"i1988) 
[ 6] Aievizopouios A.. Dusserre Y. Tsai-Pfiugf elder M.. von der Weld T. VVahli VV.. Mermod N. 

Genes Dev. 9:3051-3068(19951 

[0332] 90. Calsequestnn (Calsequestnn) 
Number ot members: 1 3 

[0333] Cals>K{ii> j stnn is, a moderate -affinity high-cap?<rity ralnum-binding ptoiein of c;<idiae ;<nd sf-eietal muscle 
whete it is located in tne lumenai space ot the saicoplasmic reticulum terminal cisteinae Calsequestrin acts as a 
calcium buffer and plays an important role in the muscle excitation-contraction coupiinq. If is a highly acidic protein of 
about 400 amino acio lesidues that binds mote than 40 moies of calcium per mote of ptotein Theie are at least two 
different forms of calsequestrin one which is expressed in cardiac muscles and another in skeletal muscles. Both 
forms have highly similar sequences. 

[0334] Two sicsnatuie sequences ha^ e t een developeo 7 he fust cones ponds to the N terminus of the mature protein, 
tht! se(.ono is legated just in front of the C 4ft minus of thu protein which is composed of a highly acidic tail of variable 
length. 

- Consensus pattern: [EQj-JDEj-G-L-EDNj-F-P-x-Y-D-G-x-D-R-V 

- Consensus pattern: [DEH--E^VyHLI^>E-[>V-L-x^xH;LIVM}^-T^-MWD 

[0335] [ 1] lieves S \ilsen B Chio2zi P Andetsen 1 P Zoi^ato h 

eiothwn 1283 767-^2(19025 
[0336] ^1 C aibovyi trarti. (C arbo<yl tiansfeia?e domain) 

[1] 

Medline: 93374821 

Pnmaiy stiucture cf the incnomerof tne 12S subunit ottranscartxvylase as deduced from DMA and characterization 
of the product expressed in Escherichia colt. 

Thornton < G kuin=n GK Ha=ist; FC Ptiillif. s NF v\'uo bB Faik VM 

Magner WJ Shenoy BC Wood HG Samols D J Bactenol i90o 175 530i-5o08 
[2] 

Medline: 93358891 
Moleiukit e^ olution of biotin-depwident ;atbo>yiases 
Toh H. Kondo H. Tanabe T: 
C-ui J kiochem 1 &<V- 2 1 b W> -6«(r 
All of tht inc- nibtrt, in this f imtly are biotin dtpsmdsmt carboxylases 

Tne carboy I ttansfeiase domain carnes out the following reaction tianscarbowiatton ttom biotm to an acceptor mol- 
ecule Theie ai<? fv,o rt-eoo.ni->ed type-> ot c^ibovyl tian«f<?rase ' >ne otth<*m ut^t, ac>!-Ct A ^nd the other uses 2-oxo 
acid as the acceptor molecule of carbon dioxide. 
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All of the membets in this famt!\ utilise acyl-CoA as the acceptor molecule 
Number of members: 47 

[0337] 02 uha!_stil_synl (Chakone and stilb^ne i.yntha^) 
Number of members: 146 

* [0338j Cnalcone synthases tCHSi {EC 2 c 1 74s and stilbe-ne s\ntnases (STSi (formerly kno^n as resveratro! syn 
thases} aie related plant enz\ me* j 1 1 OH6> n .tn important en.Tvme in fl.jv.jnoid biosynthesis and ~?'| , ( kev en.Tvme 
tn snl!> j ne-type phslualiivin biosynthesis Both thymus catalyst the addition of thr> j e 1 nui^ct lies o! inaionyi-CuAto a 
starter CoA ester (a typical example is 4-coumaroyi-CoA) producing either a chaicone (with CHS) or stilbene (with 

10 [0339] These c nzymcs aie pioteins ot about 190 amine -acid residues A ^'onse-tved i-ybtetrn* residue located in the 
ctntt i! section of these proteins hat bt^n shown [2] to > j ssentia! for the eataMic aauiU of both tn^ymts and 
probably represents the binding site for the 4-coumaryl-CoA group. The region around this active site residue is well 
conseo'ed and o.tn be used .js a signature f attain 

[0340] In addition to tne plant enzvmes this family also includes Bacillus subtilis bcsA 

- Consensus pattern: R-[LtVMFYS3-x-[LIVM}-x-[QHG]-x-G-C-[FYNA3-[GA]-G-[GAHSTAV}-x-[LiVMF]-[RA3 [C is the 
active site residue) 

[ 1 ] Schroeder J., schroeder G. 
20 Z. Nakirforsch. 45C:1-8i'199Q). 

[2] Lane: T TtopfS Matnoi F-j Schroeder J SchioederG 
J. Biol. Chem. 268:9971-9976(19911 

[0341] 33 Chorism iie_syni (Chuiismat.* synthase j 

25 Number of members: 19 

[0342] Ohoitsmate s>nthase ;E:C 4 6 1 4t catalyse? the last of the se^en steps in the shikimate pathwav which is 
used tnpiokarjdes ftincj! and plants for !he hie svnthesisot aromatic amino acids, It ratal y.res the I 4-trans elimtnalie n 
of the phosphat- 1 group from 5-> j rmipuu^!shihm^e -3-phosphate (EPSPt to fuim chortsm?<tt; vi hich (.an then be used 
in phenylalanine tyrosine or tryptophan Dios\ntnesis Chonsmate syntnase requites the presence of a reduced ftaxin 

w mononucleotide (FMNH2 or PADH2) tor its activity. 

[0343] Cnonsmate synthase frorm arious sources shcMs (1 2] a high degree of sequence consecution It is a ptotein 
of about 3C0 to 400 ammo-acid residues Three signature patterns have been developed from conserved regions rich 
in basic residues (mostly arginmes V The first is in the N-ternnna! section, the second is centra! and the third is C-terminal. 

35 - Consensus pattern: G-E-S-H-[GC3-xf2HLIVMHGTV]-x-[L!VMl(2)-[DE}-G-x.[PV] 

- Consensus pattern: [GE3-R-[SA3(2HSAG]-R-[EVHSTj-x(2HRHj-V-xt2)-G 

- Consensus pattern: R-{SH]-D-[PSV3-ECSAV3-x{4)-[GAI3-x-ElVGSP)-[LIVM3-x-E-[STAH3-[LIVM) 

[ 13 SchallerA., Schmid J., leibfngertL Amrhein N. 
40 J. Biol. Chem. 266:21434-21438(1991 j. 

[2] Jones D.G.L.. Reusser U.. Bra us G.H. 
Mol. Micrabioi. 5:21 43-21 52(1 991). 

[0344] 94 C kit ^djptor .s (Clathnn aciaptot complex small chain t 

45 Number of members: 21 

[0345] Clatnnn coated vesicles tCCV) mediate intracellular mernbiane traffic such a e receptor mediated endecytosis 
In addition to clathnn. the GOV are composed of a number of other components including oligomers complexes which 
aie knewn as adaptor or daHutn assembly piofeinst^ PI complex; s [ 1 ] Tht adaptor cumf lexes are be1te\ ed iu interact 
with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. In mammals two type of 

■io adaptor completes arc known AP-1 which is associated with the Golgi complex and AP-2 Mhich is associated with 
the pfasma membrjne Both &P 1 .jnd &P 2 .tie hetetotetiamers th.it consist of twe lame chjint ■ the adaptins 
(gamma and befa' in alpha and beta tn AP-2) a medium :ham (AP4 in AP-1 £P*,ti in &P-2) and <j small chain 
(AP19 in AP-1: AP17 in AP-2). 

[0346] The small chains of AP-1 and AP-2 art- e^oiuttonan, i elated piotems or about 18 M Homoloys of API V and 
ss <\P1Q h3 , " j also be > j n found tn ye 1st (genes ^FS1AAP1& and -iFS2'\4Pri [2 3 4] Apr and AP19 ar.> a !*,.-> elated 
to the ^eta-chain [5] of coatomer tzeta-copj a cytosoitc protein complex that revetsibly associates with Golgi mem- 
branes to form vesicles that mediate biosynthetic protein transport from the endoplasmic reticulum, via the Golgi up 
to the; trans Golgi network. 
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[0347] A conserved region in the centra! section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LlVM]{2)-Y-[KR]-x{4)-L-Y-F 

s [1] Pearse B.M., Robinson M.S. 

Annu Rev. Ceil Biol. 6:1 51 -171 (1990). 
[ 2} Kirchhausen T. Davis A C . Frucht S . O'Brine Greco B . Payne G.S : 
Tubb B. 

J. Biol. Chem. 266:1 11 53-111 57(1 991). 
10 { 3] Nakai M. , Takada T. , Endo T. 

Biochim Biophys. Acta 1174:282-284(199.3} 
[ 43 Phan H.I., Finfay J A, Chu D.S.. Tan P.K., Kirchhausen T.. Payne G.S. 

EMBOJ. 13:1706-1717(1994). 
[ 53 huge O . Hara-Kuge 3 . Orct L . Pavazzoia M.. Amherdt M.. Tamgawa G., 
*s Wieland F.T., Rothman J.E. 

J. Cell Bio!. 123: 1727-1 734(1993). 

[0348] 95. Ciathrin Jg_ch (Ciathrin light chain.) 
Number of members: 8 

so [0349] Ciathrin [1.2] is the major coat-forming protein that encloses vesicles such as. coated pits and forms cell 
surface patches involved in membrane traffic within eukaryotic cells. The ciathrin coats (called triskelions) are com- 
posed of three heavy chains (180 Kd) and three light chains (23 to 27 Kdt. 

[0350] The ciathrin light chains [3], which may help to properly orient the assembly and disassembly of the ciathrin 
coats, bind non-covaiently to the heavy chain, they also bind calcium and interact with the hsc70 uncoating ATPase. 

in higher eukaryotes two genes code for distinct but related light chains L.Cfa) and LC(b). Each of the two genes 
can yield, by tissue-specific alternative splicing, two separate forms, which differ by the insertion of a sequence of 
respectively thirty or eighteen residues. There is, in the N-termina! pact of the ciathrin light chains a domain of 
twenty one amino acid residues which is perfectly conserved in LC(a) and LC(b), 
30 - In yeast there is a single light chain ('gene CLC1} whose sequence is only distantly related to that of higher eu- 
karyotes. 

[0351] Two signature patterns have been developed for ciathrin light chains The first pattern is a heptapeptide from 
the center of the conserved N -terminal region of eukaryotic light chains: the second pattern is derived from a positively 
35 charged region located in the C-terminai extremity of aii known ciathrin light chains. 

- Consensus pattern: F-L-A-Q-Q-E-S 

[ 13 Keen J.H. 
40 Annu Rev Biochem. 59:415-438(1990). 

[23 Brodsky P.M. 

Science 242:1398-1402(1988), 
[ 3] Brodsky F.M., Hill B.L., Acton S,L. Naethke L, Wong D.H . 
Ponnambalam S., Parham P 
45 Trends Biochem. Sci 16 208-213(1991). 

[0352] S6. (Ciathrin repeat) 7-fold repeat in Ciathrin and VPS 

Each repeat is about 140 amino acids long The repeats occur in the arm region of the Ciathrin heavy chain. 
Number of members; 79 

Medline: 92191269 

Folding and trimerization of ciathrin subunits at the tnskeiion hub, 
Nathke IS, Heuser J, Upas A, Stock J, Turck CW, Brodsky FM: 
Ceii 1992:68:899-910. [2] 
ss Medline: 88097376 

Ciathrin heavy chain molecular cloning and complete primary structure 
Kirchhausen T. Harrison SC. Chow EP, Mattaiiano RJ. 
Ramachandran KL, Smart J, Brosius J; 
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Proc Nat! Acad Sci USA 1987,84:8805-8809. 
[0353] 97 Collagen (Collagen triple helix repeat (20 copies)) 
[1] Medline: 94059583 
New members of the collagen superfamily 
s Mayne R, Brewtort RG; 

Gurr Optfi Cell Bio! 1993;5:883-890, 
Scurvy is associated with collagens. 
Members of this family belong to the collagen superfamily [1]. 

Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. 
10 The alignment contains 20 copies of the G-A-Y repeat that forms a triple helix The first position of Hie repeat is. glycine 
the second and third positions can be any residua but are frequently proline and hydroxyproiine Collagens are post 
translationally modified by proline hydo>:ylase to form the hydroxyproline residues Defective hydtoyylation is the cause 
of scurvy. 

Some members of the collagen superfamily are not involved in connective tissue structure but shate the same tnple 
*s helical structure. 

Number of members: 2125 

[0354] 98 Copro(jen..oxtdas (Coprcporphyrinogen 111 oxidase) 
Number of members: 12 

Coproporphynnogen HI oxidase (EC 1.3.3.3) (coproporphynnogenase) [1,2] catalyzes the oxidative decarboxylation 
so of coproporphynnogen III into protoporphynnogen IX, a common step in the pathway for the biosynthesis of porphyrins 
such as heme, chlorophyll or cobalamin. 

[0355] Coproporphynnogen Hi oxidase is an enzyme that requires iron for its activity. A cysteine seems to be important 
for the catalytic mechanism [3] Sequences from a variety of eukaryotic and prokaryotic sources show that this enzyme 
has been evolutionary conserved. A highly conserved region in the central part of the sequence has been selected 
25 as a signature pattern. This region contains the only conserved cysteine and is rich in charged amino acids. 

- Consensus pattern: K-x-W-C-xi2HFYHK3HL!VM]-x-H-R-x-E-x-R-G-[L!VM3-G-G-[L!VM]-F-F-0 



[1]Xu K., EllinttT. 

J. Bacterid 17^4990-4999(1993) 
[ 2] Kohno H.. Furukavva T . Yoshinaga I, Tokunaga R . Taketani S 

J. Biol, Chem. 268:21359-21383(1093). 
[ 3j Camadro J.M., Chambon H., Jolles J., Labbe P. 

Eur J Biochem. 156:579-587(1986). 
[4] Xu K... Elliott T. 

J, Bacteriol. 176:3196-3203(1994). 



[0356] 99 Corona_nucleocs (Corona virus nucleocapsid protein) 
[1] 

40 Medline: 98087828 

Identification of a specific interaction between the 
coronavirus mouse hepatitis virus A59 nucleocapsrd protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
45 Virology 1997;239:78-86, 

Number of members: 44 
[0357] 100 Cu-oxidase (Muitioopper oxidase) 
[1] 

Medline 90126844 
so The blue oxidases, ascorbate oxidase, laccase and eeruloplasmin. 
Modelling and structural relationships 
Messerschmidt A. Huber R. 
Eur J Biochem 1990;187:341-352. 
Number of members: 150 

55 [0358] Muitioopper oxidases [1 : 2] are enzymes that possess three spectroscopy !!y different copper centers. These 
centers are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). The enzymes that belong to 
this family are: 
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Laccase(EC1 1u 3 1) tiiribhiol oxidase? an enzyme found in fungi and plants ^hich o*idr*es many different types 
of phenols and diamines. 

Ascorbate oxidase (Eu 1.10.3.3V a higher plant enzyme. 

Ceruloplasmin (EC 1,18.3. "B (ferroxidasel a protein found in the serum of mammals and birds, which oxidizes a 
s great variety of inorganic ana organic substances Structurally ceruloplasmin e*nibits internal sequence homology 

.tnd seem to h.ue evolved fit m the triplication of .t copper- binding domain iimil^r k that k und in iacojse and 
ascorbate oxidase. 

[0359] In addition to the above en~\mes there are a numbei of proteins Khtoh on the basis of sequence similarities, 
to can be said to belong to this family. These proteins tans: 

Coppei if^istancf pKttin A ^xpAi fioin a iJasmid in Pseudomonas biimga* 1 This MOte»"i sef-msto b<= invdvtd 
in the resistance of the microbial host to copper. 
Blood coagulation factor V (Fa V). 
is - Blood coagulation factor VIII (Fa Villi [£ 1 ]. 

feast FET3 }3j. which is required for ferrous iron uptake. 

't'e.jst hvpothetka! ptotein YR.04 U and SpAC IF 7 Ocs the fission veast honxlog 

[0360] FactoisV and VIII actasoofaotois in blood coagulation and are structurally i.tmilar[4] Their sequence consists 
so of a tnplK aled A domain a B domain and a duplio^-d C domain in trn» tolktvunq oidci A-A-B-A-C-C Ttu : 'A-lype 
domain is related to the multicopper oxidases. 

[0361] Two bignatme patterns, have been developed foi these pioteins Eoth patterns are denied from the same 
legion v,hi_inn asv.oib^t<= ovida^e lactase in th*> thud domain of v.eiuloplat.mm and in v.opA contains five- lesidues 
thJt at#- Kno'vn to !>=• involved in the binding of copp> j f (.enttis Thi* first pattern do.**, not nuk- 1 any assumption on the 
25 presence of copper-binding restoues and thus can aetect domains that have lost tne ability to omd copper isuch as 
those in Fa V and Fa vlii), while the second pattern is specifR to topper-binding domain*. 

- Consensus pattern: G-x-(FYW>x-(UVMFYW}-x-(CST>x(8V-G^LM3-x{3HLfVMFYVV] 

Consensus pattern H-C-H-vi;3}-H-Ai3J-[AGj-[LM] [The first t>o Hs are cooper type 3 binding residues] [The C 
30 the 3rd hi. and L or M are copper type 1 hgandsj 

£03 623 1 01 ■ Culiin (Cull in family) 
Number ot members: 24 

[0363] The k Ikwuip. proteins are collectively termed cull ins [ ! | 

35 

Caenorhabditis elegans cul-1 (or !in~19i. a protein required for developrrtentally programmed transitions from the 
G1 phase of thf=- oell i.yde to the GO phase or the apoptotK pathway 

- Caenorhabditis elegans eul-2. eul-3. cul-4 (F45E12.3). eul-5 {ZK858.1) and cu!-6 (K08E7.71 

- Mammalian CUL1 CUL2 CUL3 CUL4A and CUL4B 

40 - Mammalian vasopressin-activated calcium-mobilizing receptor ( VAC M-1 ), a Kidney-specitic protein thought to form 
: i cell surface icocplor [2j but v \hKh doe 1 , not have an> structural hallmarks of a receploi 

- Drosophila lin19. 

/east C DC 53 [ 3 J which acfe in concert wrth C DC 4 and UBC 3 tC DC 34) to control the- G1 -to-S phase- transition 
'/east hypothetical protein YGR003w. 
45 - Fission yeast hypothetic i! protein SpAC24H6 03 

[0364] The cullnii. are hvdiophilic pioteins of /4uto 81 J amino and*. TheC-teiminal e<tremrty is the most consewt-d 
part of these f.R(tt;ins A ^ignatur^ pattern has teen develop ed ticm lhat r<=-gi<n 

so - Consensus pattern: [L!V]-K-x(2)-[LIV]-x(2VL-l-[DEQHKRHNQ]-x-Y-[LiVM]--x--R--x{6.7HFY]--x--Y-x-[SA]> 

[ 1] Kipreos E.T.. Lander I.E.. Wing J.P.. He W.W,. Hedgecock E.M. 

Cell 85:829-839(1996). 
[2] Burnatowska-Hledin M.A . Spielman W.S.. Smith W.L., Shi R. Meyer J,M„ 
ss DewittD.L 

Am J Phv'sioi :o3f1i98-F12i0(i995} 
[ 3] Matht^s N Johnson S I Wmey M , Adam* a f Go*»ts\h l Pungl<r l P 

Byers B.. Goebl M.G. 
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Mo!. Cell Biol. 18:8834-6643(1996}. 

[0365] 102. JCtt_amine_oxid) 
Copper amine oxidase signatures 
s Amine ovidases (AO) [1] arc enzymes that catalyze the oxidation of a wide range- of biogenic amines including many 
neurotransmitters histamine ana >enotiotic amines 'fhete ate two classes of ^mine otd^ses fi^'in -v. onioning {£(. 
1.4.3.4) and copper-cortiaininq (EC 1.4.3.esl 

[0366] Copper-containing AO is found in bacteria fungi plants and animals it is an homodimenc enzyme tnat binds 
one t-oppet ion p*- r subunit as v*ell a? a !,4,5- trihydio\> phenylalanine qumone toi topaoumone ) < 5'PO) oofaotoi 'IN*. 

to cof actor is derived from a tyrosine residue. 

[0367] Tv.ii siqn ituru patterns <v# re d> j nvud fui copper AO the fust un« (.ordains tht tyrosine <vhirh gi*/* iii^ to tht 
TPQ cofactor while the second one contains one of the three histtdmes that bind the copper atom [2]. 
[0368] Consensus |Mtttm[U\. Mj-[1.K M&]-[l. IVMf--j->.4i-lST]-xt 2 i-N-VpE j-h'Nj [ ! he fust Y gn * uses to TPCjj Se- 
quences known to belong t? this class detected fc y the putter nALL 

»5 [0369] Consensus pattern T-x-[GS]-x(2VH-[UVMF]-x(3)-E-[DE}-x-P [H is a copper iigand] Sequences known to be- 
long to this class detected by the pattern ALL. except for lentil AO. 

[ i j Knosvles F F Doole> D M (im Metal ions in biological systems Sigel H Sigei A Eds c0 36 i- 403 Marcel 
Dekker. New- York. {1993). 

20 [ 2] Faxons M R c onvery M A Wilmot C M Yadav K C °> Biakeley V uoine r A b Phillips S E V McPht^on 
M I Knowie^PF Structured 11- 1-11*4(1995' 

[0370] 10o Cib-MOteasn^ \btfine piote^stt 
Number of members: 358 

25 [0371] Eukaryotic thiol proteases {EC 3 4 22 -> ( 1 j are a family of proteolytic enzymes > ■/hsch contain an actis'e site 
cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidme side chain: an 
asparagtne completes the essential catalytic triad. The proteases which are currently known to belong to this family 
ir« listed hek»v\ (n^fprinis art; only piovidfd for rtsctsntly dtteimin* d s^qu^EKesi 

30 - vertebral lysosomal cath^psins B (k< 3 4 22 1 1 H (hC •> 4 22 W L (EC 3 4 22 la} and S (EC 3 4 22 2 't [2] 
\en>brate ly^iomal dipeptidyl peptidase I iEC 3 4 14 it {also known as ratnepsin C) [2] 
Vertebrate calpams (EC 3.4.22.17). Calpams are intracellular calcium-activated thiol protease that contain both a 
N-termmal catalytic domain and a C-termina! calcium-binding domain. 
Mammalian cathepsin K. which seems involved in osteoclastic bone resorption [3j. 

35 - Human cathepsin O [A]. 

Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug BLM (a glycopeptide). 

- Plant enzymes barley aleutain (EC 3 4 22 16 1, EP-B1/B4; kidney bean EP-C1 . rice bean SH-EP, kiwi fruit sctinidin 
(EC 3.4.22 14), papaya latex papain {EC 3,4,22.2), chymopapain {EC 3.4.22.6), canc3tn (EC 3 4 22.30), and pro- 
teinase IV {EC 3 4 22.25). pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32). rape 

40 CGT44. rice orysain alpha, beta, and gamma, tomato low-temperature induced, Arabidopsis thaliana A494. RD19A 

and RD21A. 

House-dust mites allergens DerP1 and EurMt. 

Cathepsin B-like proteinases from the worms. Caenorhabditis elegans (genes gcp-1, cpr-3, cpr-4, cpr-5 and cpr- 
8), Schistosoma mansoni (antigen SM31) and Japonic.a (antigen S.J31). Haemonchus oontortus (genes AC-1 and 
4S AC-2), and Osiertagia ostertagi iCP-1 and CP-3). 

Slime mold cysteine proteinases CP1 and CP2 
Cruzipain from Trypanosoma cruzt and brucei. 

Throphonoite cysteine pioteinase (TCP) from various Plasmodium species 
Proteases from Leishmania mexicana, Theileria annulata and Theilena parva. 
so - Baculoviruses cathepsin-fike enzyme (v-cath). 

Drosophiia small optic lobes protein (gene sol), a neuronal protein that contains a oaipain4ike domain, 

- Yeast thiol protease BLH1/YCP1/LAP3. 

Caenorhabditis elegans hypothetical protein C 06G4 2, a calpain-like protein. 

ss [0372] Two bacterial peptidases are also part of this family 

Aminopeptidase C from Lactococcus lactis (gene pepC) [5], 
Thiol protease tpr from Potphyrarnonss gingivals 
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[0373] Three other proteins are structurally related to this family, but may have lost tneir ptoteolytic activity 

Soybean oil body protein P34 This protein has tts active site cysteine teplaced by a glycine 
Rat testirt, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
s by a serine. Rat testin should not be confused with mouse testin which is a LIM -domain protein tsec 

<PE>OC(30382>), 

Plasmodium falciparum senne-repeat protein (SERA), the major blood stags; antigen This protein of 111 M pos- 
sesses a C-termina! thioi-protease-itke domain (6j, but the active site cysteine is replaced by a serine 

to [0374] The sequences around the three active site residues art well conserved and can t-i used as signafme pat- 
terns. 

- Consensus pattern: 0->;(3t-[C5E3-x-C-[YVV}-:<{2V-ES'rAGC}{SrAGCV3 [C is the active site tesidue] 

- Consensus pattern; [LIVMGSTAN]-x-H-[GSACEHLIVM]-x-[UVMAT](2)-G-x-[GSADNH3 [H is the active site test- 
is due3 

- Consensus pattern: [FYCHHWIjTlJVT]^ 
[LIVMF] [N is the active site residue] 

20 [ 1 j Dufour E. 

Biochimie 70:1335-1342(1988). 
[ 2] Kirschke H.. Barrett A J , Rawhngs N O. 

Protein Prof. 2: 1587-1643(1995). 
[ 33 Shi G -P., Chapman H A., Bhairi S M . Qeieeuw C, Reddy VY : Weiss S J 
25 FE8S Lett. 357:129-134(1995). 

[ 4j Veiasco G., Ferrando A A... Puente X S.. Sanchez I. M.. t.operr-Otin C. 

-J. Biol Chem 269:27136-27142(1994). 
[ 5j Chapoi-Chartier M P., Nardi M.. Chopin M.C . Chopin A.. Gnpon J.C. 
Appi. Environ. Microbiol. 59:330-333(1993). 
30 [ 6] Higgins D.G., McDonnell D.J., Sharp P.M. 

Nature 340 604-604(1 989). 
[73 Rawlings N.D., Barrett A J. 
Meth. Enzymol. 244:461-486(1994). 

35 [0375] 104 Gys_Met_M«:ta_PP (Cys.'Met metabolism PLP-dependent enzyme) 
p) Medline: 96428687 

Crystal structure- of the pyndoxal-5'-phcsphate dependent cystathionine beta-lyase from Escherichia coli at 1.83 A. 
Clausen T, Huber R, laber 8, Pohlenz HD, Messerschmidt A; 
J Mot Biol 1996:262-202-224. 
40 [1] Medline: 99059720 

Crystal structure of Escherichia coli cystathionine gamrna-synthase at 1 5 A resolution 
Clausen T. Huber R : Prade L, Wahl MC. Messerschmidt A. 
E M BO J 1 998 ; 1 7 : 6827-6838. 
Database Reference- SCOP; 1cs1; fa: (8COP-USA[(CATH-PDBSUMj 
<fs This family includes enzymes involved in cysteine and methionine metabolism. The following are members: 

Cystathionine gamma-iyase. 
Cystathionine gamma-synthase. 
Cystathionine beta-lyase. 
so Methionine gamma-iyase. 

OAH/OAS sulfhydrylase, 
O-succinylhomoserine sulphhydryiase 

All of these members participate is slightly different reactions. 
55 All these enzymes use PLP (pyridoxa!-5'-phosphate) as a cofactor. 
Number of members: 52 

[0376] A number of pyndoxal-dependent enzymes involved in the metabolism of cysteine, homocysteine and me- 
thionine have been shown [1,2] to be evolutionary related These are 
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Cystathionine gamma-lyase t EC 4 4 1 1) tgamma-cybtatnionasei y.hich catalyses the ttansfoimation of cystath- 
ionine into cysteine, oxobutanoate and ammonia This is the final reaction inthetransulfuration pathway that leads 
from methionine to cysteine in eukaryotes. 

CvsLithmnme gamma-synthase (EC 4 2 99 9) i vhich catalyzes, the conversion of ^vsteine and ^uccinyl-hGmoser- 
s me into cystathionine and succinate tne first stop in the bios> nthesis of methionine- from c\ stoine in bacteria tgone 

metB). 

Cystathionine beta-lyast? (EC 4 4 1 8> UnMa-cst ilhion isel which eataKzus thi* convulsion of cystathionine into 
homocysteine, pyruvate and ammonia: the second step in the biosynthesis of methionine from cysteine in bacteria 
(cjene metCj. 

to - Methionine g^mma-iyast; (EC 4 4 1 11) (L-inethionmase) which (.ataiy^esj the transformation of methionine ink 
methanethiol. oxobutanoate and ammonia. 

OAH/OA^ stilfhydiylase which catalyses the com ei sun of acetvlhomoseime into homocysteine and that of ace- 
t\lseiine into cysteine (gene MKT \7 oi M£7.?E> in vejsti 
O-succinylhomosenne sulfhydrvlase (EC 4.2.99.-). 
is - Yeast hypothetical protein YGL 184c, 
feast hypothetical protein VHR112c. 

[0377] These enzymes are proteins of about 400 ammo-actd residues The pyrtdosal-P gtonp is attached to a lysine 
residue located in the cential section ot these enzynvs, the sequent around this residue is highly conserved and oan 
so be utjfti as a sicinafur-i pattern k d-it-ict this class of enzymes 

- Consensus pattern: [DG]-[L!V MF]-x(3 !-[STAGC]-ISTAGC !]-T-K-[FYWQ3-[ Li VM F]-x-G-[HQ]-[SGN H] [K is the pyri- 
doxal-P attachment site] 

25 [ 1 j Ono BY. Tanaka K., Naito K.. HeikeC, Shinoda S . Yamamoto S.. Ghmon S. ; Oshima T . Toh-E A. J. Bactenol. 

174:3339-3347(1992). 

\ 2] Barton & B Kabaok D & Clark M W keng T >. uellette B F F , Mwm*. R K Zt-ng B Zhong WW Foitin 
N Deianey S Bussey H Yeast 9 3^3-16°! 1 °)ij3) 

30 [0378] 1 05. Cyt_reductase 

FAD/NAD-binding Cytochrome reductase 
Number of members: 60 
[1] Medline: 95111952 

Ctvstal struotute of the f'-AD containing fragment of iwn nitiate ipduct.ts<r at 2 6 A resolution iel.tti<. nt hip k othet 
35 fiavoprotein reductases. 

Lu G, Campbei! WH, Schneider G, Lindqvist Y; 
Structure 1994:2:809-821. 
[2] Mediine: 92084635 

The seouence of bquasn NAOH nitrate reductase and its telationbhirj to the sequences of other tlavonrotem oxidote- 

40 ductals A famiK ot toopiotein pyridine nucleotide cytov. hi om<= teductases 
Hyde GE. Crawford NM, Campbei! WH: 
J Biol Chem 1991:266:23542-23547. 
[0379] 106. CyticMyltrans 
Phosphatidate cytidviy (transferase 

45 Number of members: 21 

[0380] Phospnatidate cytid\ iyitiansferase (EC : "7 4 n [ i, 2,3] tabo lmo\A.n as CDP-diacylgKcerol synthase) (CDS) 
istheenzymf that ratal y.testht- synthesis of CDP-diaovlcjiv^rolfiomCTP and phosphatidatetP^i CDP diacyigl>ceioi 
is an important bianoh point ink i mediate in both prokaryofic and -iukatyotK organisms »"CS is a membrane-bound 
en;:\me A conserved region located in the C~terminai part has been seiecteo as a signature pattern 

- Umensus partem S->-|UVMf : |-K-F'->(4i K-E)-v(GSA| .!.?)-(!.l| [PG] * H G-G-fLIVMj <-D-R-[l. IVMF-l D 

[ 1j Sparrow CP.. Raete C.R.H. 

J. Biol, Chem, 260:12084-12091(1985!. 
ss [ 23 Shen H.. Heacock P.N.. Claneey C.J.. Dowhart W. 

J. Biol. Chem. 271:789-795(1998). 
[ 3] Sarto S.. Goto K., Tonosaki A., Kondo H. 

J. Biol. Chem. 272:9503-9509(1997). 
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[0381] i 07. (Cytidylyitransf ) Cytidylyltransferase. This family includes' Cholinephosphate cytidylyltransferase Glyc- 
eroi-3-phosphate cytidylyltransferase. 
Number of members: 84 

[0382] [1] Medline 10206837 CTP:Phosphocholine Cytidylyltransferase Insights into Regulatory Mechanisms and 

s Novel Functions, Clement JM, Kent C; Biochem Biopbys Res Commurt 1999;257:643-650. 

[0383] l 08 i'ciMMP binding) Cyclic nuoleotide-bmding domain signatures and profile Proteins that bind cyclic nucle- 
otides fcAMP or cGMP) share a structural dornainof about 120 residues [1-3]. The best studied of these proteins is 
the prokary otic cataboiite gene activator (also known as the cAMP receptorprotetn) (gene crpj where such a domain 
is known to be composed ofthreeaipha-heiices and a distinctive eight-stranded, antiparailel beta-barrelstructure. Such 

10 a domain is known to exist in the following proteins: - 

Prokaryotic calaboiiie gene activator protein (CAP) - cAMP- and cGMP-dependeni protein kinases (cAPK and cGPK) 
Both types of kinases contains two tandem copies of the cyclic nucleotide-bmdmg domain The cAPK's are composed 
of two different subunits" a catalytic chain and a regulatory chain which contains both copies of the domain The oGPK's 
are single chain enzymes that include the two copies ofthe domain in their N~ terminal section. The nucleotide specificity 

*s of cAPK and cGPK is due to an amino acid in the conserved region of beta-barrel 7: a threonine that is invariant in 
cGPK is an alanine in most cAPK - Vertebrate cyclic nucteotide-gated ion-channels. Two such cations channels have 
been fully characterized. One is found m rod cells where it plays a role in visual signal transduction It specifically binds 
to cGMP leading to an opening of the channel and thereby causing a depolarization of rod photoreceptors In olfactory 
epithelium a similar. cAfvlP-binding. channel plays a role in odorant signal transduction. There are six invariant ammo 

so acids in this domain, three of which ate glycine residues that are thought to be essential for maintenance of the of the 
beta-barrel Two signature patterns forthis domain have been developed. The first pattern is located within beta-barrels 
2 and 3 and contains the first two conserved Gly. The second pattern is located w ; ithin beta-barrels 6 and 7 and contains 
the third conserved Gly as well as the three other invariant residues.- 
First consensus pattern: [L!VMHV!C]~x(2)-G-[DENQTA]-x-[GAC].x(2HLIVMFY](4)-x(2)-G 

25 Second consensus pattern: (UVMFj-G-E-x^GASHUVMJ-xfSJ 1 hR-[STAQ]~A-x-[L!VMA}-x- [STACVj- 

[ 1] Weber I.T . Shabb J.B . Corbm J.D. Biochemistry 28:61 22-61 27! 1989). 

[2] Kaupp U.B. Trends Neurosci, 14,150-157(1991). 

[ 3j Shabb J.B.. Corbin J D J Bioi. Chem. 267-5723-5726(1992). 

[0384] 109. (cadherinS 

Cadhehns extracellular repeated domain signature 

Cadhenns [1,2] aie a family of animal glycoproteins responsible for calcium-dependent celi-cell adhesion. Cadhenns 
preferentially interact with themselves in a hemophilic manner in connecting ceils; thus acting as both receptor and 
35 Itgand. A wide number of tissue-specific, forms of cadherins are known 

- Epithelial (E-cadhenn) (also known as uvomorulin or L-CAM) (CDH1 ). 

- Neural (N-cadhertn) (CDH2). 

- Placental (P-cadhenn) (CDH3), 
40 - Retinal (R-cadherin) (CDH4). 

- Vascular endothelial (VE-cadhenn) (CDH5). 

- Kidney (K-cadherin) (CDH6). 

- Cadherin--8 (CDH8). 

- Osteoblast (OB-cadherin) (CDH11 ). 
45 - Brain {BR-cadherin)(CDH12). 

- T-cadhenn (truncated cadherin) (CDH13). 
Muscle (M-cadherin) (CDH14). 
Liver-intestine (Ll-cadherin). 
EP-cadherin. 

[0385] Structurally, cadherins are built ofthe following domains, a signal sequence, followed by a propeptide of about 
130 residues, then an extracellular domain of around 600 residues, then a transmembrane region, and finally a C- 
termirtal cytoplasmic domain of about 150 residues. The extracellular domain can be sub- divided into five parts: there 
are four repeats of about 110 residues followed by a region that contains tour conserved cysteines, it is suggested that 
55 the calcium-binding region of cadherins is located in the extracellular repeats 

[0386] Cadhenns are evolutionary related to the desmogieins which are component of intercellular desmosome junc- 
tions involved in the interaction of plaque proteins: 
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Desmoglein 1 (desmosomal giycoprotein I). 

Desmoglein 2. 

Desmoglein 3 (Pemphigus vulgaris antigen). 

s> [0387j The- Drosophila fatoroteinpj ts a huge orotcm of over 5000 amino acids that contains 34 cadherm like repeats 
in its extracellular domain. 

[0388] Thi* sigruture pjtk-tn that s^as d.>v.>!oped lor the mp.* lied dom iin is toe lied in it thu C -terminal .*ar.*miU 
which is its best conserved region The pattern includes two conserved aspartic acid residues as vveii as tvso aspar- 
agmes: these residues could be implicated in the binding ot calcium. 
10 [0389] uon^n<.u<. p=itfein[L!v;h-[LI\ ]->-D->-N-D-[NH]-x-P Sequ^nc^ known to be long to this da«.t, defect-id by in.* 
pattern ALL Mutt this pattern is lound in the Hi^t sr--eond arid fourth fopi.^ of the r^pojt-Kf dom iin In thu third oupy 
theie is a deletion <J on* lesidu* 1 after the second ..onsprwd Abp 

[ 1] Takeichi M. Annu. Rev. Biochem. 59:237-252(1990). 
*5 [2j Takeichi M. Trends Genet 3:213-217(1887). 

[ 3] Mahoney P.A.. Weber U.. Onofrechuk P.. Biessinann H.. Bryant P.J., Goodman C.S. Cell 67:853-868(19911. 

[0390] 110 Caiteticulin family signatures 

Calreticulin [1] (also known as calregulm, URP55 or HAuBP) is a high^apacitycalcium-binding protein which is present 

J- in mc st ti^ue 1 . and located at the p« nphery ot the endoplasmic < ERt and the sarcof Ijuik teiiculuin (bR tmembianes 
It ptooobly plavs a rHe m tne storage <'f calcium in th^ lumen oft he ER and SF ano it mav well have other important 
functions Structural!) calreticulin is. a protein of about 400 amino acid residues consisting otthiee domains a J An N~ 
terminal, probably globular, domain of about 180 amino acid residues (N-domain): b) A central domain of about 70 
iesidu> j s (F-domam) -."huh contains thrui* repeat*, of an icidic 17 amino acid motif This region binds c ilciuin with i 

25 low-capacity, but a high-affinity: o A C-termma! domain rich in acidic residues and in lysine (C-domain). This region 
binds calcium with a high-capacity but a low-aftimty. Calreticulin is evolutionary related to the following proteins: - 
Onchocerca volvulus antigen RAL- 1 RaL- I is highly similar to t alreliculm but possesses a -terminal domain tit. h in 
Kami* and aigminu and lacks acidic residues and is the-teforu nut expected to bind c ikium in that recuon - C iln-^in 
[2] Acaicium-btnding orotein that interacts with newly synthesized glycoproteins, in the endoplasmic reticulum it seems 

JO k \ lay a majoi rc te in the quality control af paiatus of trn» LH by the retenlion of ineoircctly k ided \ k terns - Caliwgm 
[3] (or calnexm-T), a testts-spectfic calcium-binding protein highly similar to calnexin. Three signature patterns have 
been developed for this famtK of proteins The first two patterns are oased on consers'ed regions in the N-domain the 
third pattern ..one^ponds to positions 4 k Itf of the leceated motif in th*> P -domain 
Consensus pattern: [KRHN]-sc-{DEQNHDEQNK}-x(3VC-G-G-[AGHFYHl-lVMHKNHLIVWiFYK2>- 

35 Consensus pattern: [UVMl(2)-F~G-P~D~x-C-[AG]- 

Consensus pattern: [!V>x-D-x-EDENST3-xt'2)-K-P-EDEH3-D-W-{DEN3- 

[13 Michalak M Milnei R C Burns K Of as M Biochem J 28^ ^8 1 -G 0 2i 1 1 9^2 ) 
[-3 Eergeion J J M Brenner M £ Tnomas O Y Williams D B Trends Biochem Set 19 i24-128{ i3s*4i 
40 [ 33 Watanabe D.. Yamada K.. Nishina Y.. Tajima Y.. Koshimizu U.. Nagata A.. Nishimune Y. J. Bioi. Chem. 269: 

7744-7749(1 994V 

[0391] 1 11 tvuKaryottc-type oaibomc anhydiases signature ;carb..anhydiai.e) 

Carbonic anhvdrases (EC 4 2, 1 , 1 ) (CA) [1 .2.3.4] are zinc metalloenzymes which catalyze the reversible hydration of 
■*s caibon dioxide Eight unzvm itic and evolution iry related fornix uf carbonic jnhydrast; ate currently known tu r-Aist m 
vertebtates three cytosolic isozymes tCA-l, CA-ll and CA-llh two membrane-bound forms (CA-IV and C & -VIH a 
mitochondria! form (CA-V): a secreted saiivaiv form tCA-VI): and a yet uncharactenzed isozyme [5). In the alga 
Chlamydomonas r^inhatdtu t>vn CA isozymes have bewi sequenced|6] Tney aie periplastic gl 1 *^ 1 proteins evolu- 
tionary related to vertebrate CAs Some bacteria such as Neisseria gonorrhoeae [ T ] aiso have a eukaryotic-type CA 
■io CAb contain a single sine atom bound to thiee con^rved histidme tebiduts Ai, a sinn^tuie foi CAb a pattern has 
been developed which includes one of these ^inc binding histidines Putein D8 from Vaccinia and othei poxviruses it 
lelut^d tn CAs but has lost two of th^ zmc-fcinding histidme^ as wHi as monv otherwise ^^n^erved le^idu^s This isa 
also true of the N-termmai e/,tracellular domain of some receptor-t\ pe tyrosme-protem phosphatases (see 
<PDOC00323>). 

ss Consensus pattern: S-E~[HN}-x-3:LlVM]-x(4HFYH]-x{2VE-ELiVMGA|-H-[L!VMFA3(2) (The second H is a zinc ligand]- 
Note: most prokaryotic CA's as well as plant chloropiast CA's belong to another, evolutionary distinct family of proteins 
(see <PpOC00586 
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[ 1] Deutsch H.F, Int. J, Biochem. 19:101-113(1987). 
[2] Fernley R T Trends Biochem Sci 13:356-359(1988). 
[ 3] Tashian R.E BioEssays 10' 186-192(1989). 
[ 4] Edwards Y Biochem Soc. Trans 18 171-175(1990). 
s { 5j Skaggs L A . Bergenhem N.C H. t Ve-nta P.J., Tashian R.E Gene- 126.291-292(1993) 

[6] Fujiwara S„ Fukuzawa H„ Tachiki A., Miyachi S. Proc. Nat!. Acad Sci. U.S.A. 87:9779-9783(1990). 

[ 7j Huang S.. Xue Y : Sauer-Erii<ssoEi E , Chirica L , Lindskog S Jonsson B H 2.3.CO 2^;J : Mo!. Bto! 283 301 : 310 

(.1998), 

10 [0392] 112. Caseins alpha/beta signature 

Caseins [1] are the major protein constituent of miik Caseins can be classified into two families, the: first consists of 
the kappa-caseins, and the second groups the alpha-si. aipha-s2, and beta-caseins The alpha/beta caseins are a 
rapidly diverging family of proteins However two regions are conserved a cluster of phosphorylated serine residues 
and the signal sequence. The signature pattern has been developed for this family of proteins based upon the !ast 

*s eight residues of the signal sequence. 

Consensus pattern: C-L-[LV]-A-x-A-[LVF]-A- 

[1] Holt C, Sawyer L Protein Eng. 2:251-259(1988). 

[0393] 113. Catalase signatures 

Catalase (EC 1.1 1.1. 6) fi .2.3] is an enzyme, present in ail aerobic cells.that decomposes hydrogen peroxide to mo- 
so lecular oxygen and water Its main function is to protect cells from the toxic effects of hydrogen peroxide, in eukaryotic 
organisms and in some prokaryot.es cataiase is a molecule composed of four identical subunits Each of the subunits 
binds one protoheme IX group. A conserved tyrosine serves as the heme proximal side ligand The region around this 
residue has been used as a first signature pattern: it also includes a conserved arginine that participates in heme- 
binding A conserved hisiidine has been shown to be important for the catalytic mechanism of the enzyme. The region 
25 around this residue has been selected as a second signature pattern - 

Consensus pattern F<--[LJVMf : STANj-f : --[(SASTNPt--Y r --y--D--[AST j-[Q£vH) [Y is the proximal heme-binding ligand) 
Consensus pattern [!F3-x-[Rhl3-xi4)-fEQl-R-x(2Vf-l->(2)-[GAS]-[GASTF]-fGASTl [H is an active site residue] 
Note some prokaryotic catalases belong to the peroxidase; family (see ''PDOCpp3?4>t 

30 [ 13 Murthy M.R.N., Reid T.J. ill Sicignano A., Tanaka N., Rossmann M.G. J. Mai. Biol. 152:465-499(1981). 

[23 Melik-Adamyan W.R., Barynin V.V., VaginAA, BorisovV.V., Vainshtein B.K., Fita I., Murthy M.R.N., Rossmann 
M.G. J. Moi. Bio!. 188:83-72(1986). 

[ 33 von Ossowki I., Hausner G., Loewen P.C. J. Mol. Evo!. 37:71-76(1993). 

A* [0394] 114 (rhitin binding) Chitin recognition or binding domain signature 

A conserved domain of 43 amino acids is found in several plant and fungal proteins that have a common binding 
specificity for oligosaccharides of N-acetylglucosamme [1]. This domain may be involved in the recognition or binding 
of chitin subunits. It has been found in the proteins listed below. - A number of non-leguminous plant lectins. The best 
characterized of these lectins are the three highly homologous wheat germ agglutinins (WGA-1, 2 and 3). WGA is an 

40 N-acetylglucosamine/N-acetyineuraminic acid binding lectin which structurally consists of a fourfold repetition of the 
4 i ammo acid domain the same; £>f.e oi slruetuie is found in a bailey root-specific lectin as well as a rice lectin. - 
Plants endochitinases (EC 3.2.1.14) from class !A (see <PDOC006»iO>). Endochrtinases are enzymes that catalyze 
the hydioiysis otthe beta- 1 4 linkages of i\! acetyl glucosamine polymers of chitin Plant chitinases function as a defense 
rtgjimtihitin containing fungal pathogens OI.jss l& chitin.jses gener.tlly contain one copy of the chitin -binding domain 

45 at their N-termina! extremity. An exception is aggkitmin/ehitinase [2] from the stinging nettle Urtica dioica which contains 
two copies of the domain. - Hevem [5], a wound-induced protein found in the latex of rubber trees. - Win1 and win2, 
tv.o wound induced proteins from potato ■ hiuy\,eiomyces lactis hllei tovm alpha subunit (3j. The toxin encoded by 
the linear rJosmio pGV L1 is compf'S^T of tnie^ subunits alpha b^t.i and gumma The gamma subunit harbors toxin 
activity ano inhibits growth of sensitive yeast strains rn the G1 phase of tne cell cycle; the alpha subunit. which is 

■io proteolytic ally pioc^bsed fa m a laiger pft^uiborthat also contains, the beta subunit isa chitinase (see <PDOC00839>) 
in chitinases. as well as in the potato wound-induced proteins, the 43-residuedomatn directly follows the signal se- 
quence and is therefore at the N-termina! of the mature protein; in the ktllertoxm alpha subunit: it is located in the centra! 
section of the protein The domain contains eight conserved cysteine residues y. hich have ail been shown, in WGA, 
to be involved in disulfide bonds. The topological arrangement of the four disulfide bonds is shown in the following 

ss figure: + ++ — I 

\ A Cgv\*Y^\Cv\*YCC$AvgYCg^vo'CAYvCvo'\C |***"*p*****" k *"-''jj||+~- -+ + +'C. conserved cysteine in- 
volved in a disulfide bond.'*': position of the pattern. 
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Consensus pattern C-.<t4,5VC~C~S~v(2hG~v~C~G~x(4i~[FYW]-C [The five C's are invoked in disulfide bonds] 

[ 1] Wright HT, Sandrasegaram G Wright C S J Mot Evo! 33 283-294! 1991 ) 
[2] LernerD.R.. Raikhe! N.V. J, Biol. Chem. 287:11085-11091(1992). 
s [ 3] Butler A.R., ODonnel R.W., Martfn V.J.. Gooday G.W., Stark M.J.R. Eur. J. Bfochem. 199:463-488(1991) 

[0395] 115 (Chitinase 1 ) Chiti nates farml;, 19 signatures 

Chitinases (EC 3.2. 1 .14) [1] are enzymes that catalyze the hydrolysis of tnebeta-1 ,4-N-acetyl-D-giucosamme linkages 
in chitin polymers. From the viewpoint of sequence similarity chitinases belong to either family 1S or 19 in the classi- 
fy ftcation of glyeosy! hydrolases [2.E1] Chitinases ot family 19f also known as classes IA or I and IB or 111 an* enzymes 
from plants that function in the defense against fungal and insect pathogens by destroying their chitm-containing evil 
wall. Class I A/I and IB/11 enzymes differ in the presence (IA/1) or absence (IB/1!) of a N-termmai chrtin-binding domain 
i seethe relevant entry '-PbO C 0 0025 ». The c atalytic domain of these enzyme?, consist of about 220 to 2'<Q ammo ac id 
residues Two highly conset^ed legions have been selected as signatuie patterns, the first one is Incited in tri^ N- 
*s terminal section and contains one of the six cysteines which are conserved in most, if not ail. of these chttmases and 
which is probabiv Involved in a disulfide bond. 

Consensus pattern: C-x{4.5)-F-Y-tST>x(3VFYHLIVMF]-x-A-x(3V[YFj-x{2)-F- [GSA] 
Consensus pattern: [LIVMHGSA3-F-x-[STAG3(2HI-IVMFY}-W-[FY]-W-[LIVM] 

20 [ 1] Flaoh J Piiet P-E . Jolles P Expenentia 48 701-716(19921 

[ 2] Hennssat B Biochem J 280 309-316(1991 5 

[0396] 116. ehloroa_b-bind 

ChioEophyl! A-B binding proteins Number of members 211 
25 f0397| 117. chromo 

The 'chromo' (CHRromatm Organization Modifier) domain ] 1 to 4} is a conserved region of about 60 amino acids which 
was originally found in Drosophila modifiers of variegation, which are proteins that modify the structure of chromatin 
to the condensed morphology of hetetochrornatin a eytoiogieaiiy visible condition where gene expression is repressed 
In protein Polycomb, the chiomo domain has been shown to be important for chromatin targeting Proteins that contains 
30 a chromo domain seem to fall into three classes: 

a) Proteins which have a N-termmal chromo domain followed by a region which is related to but distinct from the 
chromo domain and which has been termed [3] the 'chromo shadow' domain. 

b) Piotems with a single chiomo domain. 

35 ct Proteins with paired tandem chromo domains 

[0398] Currently this domain has been found in Hie following proteins 
[03993 Class A 

40 - Drosophila heterochromatin protein Su{var)205 (HP1 ). 
Human heterochromatin protein HP1 alpha. 
Mammalian modifier 1 and modifier 2. 

Fission yeast swi6, a protein involved in the repression of the silent tnating-type loci mat2 and mat3. 

45 [04003 Class B 

Drosophila protein Polycomb (Pc). 
Mammalian modifier 3. a homoioq of Pc. 

Drosophila protein SnivartJ-9 a suporessor of position-effect variegation 
so - Human Mi-2 autoanti^en chaia^tensitk of d^nnatomvobis 

Fungal letrotianposon poiypiotems 'skippy' tiom I'-us^num onsporum giasshopper' and 'MAGGY' from Mag- 

naporthe grisea and CfF-l from Cladosponum fiilvum 

Fission yeast hypothetical protein SpA.C18G6.Ckc. 

Caenoihabditis elegans hypothetical protein C29H12 6 
ss - Catnorh ihditis eleg, ms hypothetical protein IK 123^ 2 

Caenoihabditis elegans hypothetical protein T09A5 8 

[0401] Class C. 
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Mammalian DNA-bindingmelicase proteins CHD-1 to CHD-4 

- Yeast protein CHD1. 

[0402] The signature pattern for this domain corresponds to its best conserved section, which is located in its centra! 
s part. 

- Consensus partem: jTYLMUVMCHKRJA/V-xf^ 

[ 1] Paro R Trends Genet. 6:416-421(1990). 
10 [ 2} Singh P.B , Miller J R , Pearee J.. Kothary R., Burton R D , Paro R , James. T C, Gaunt S J Nucleic Acids Res 

19:789-794(1991), 

[ 3j Aasland R., Stewart A.F. Nucleic Acids Res. 23:3168-3173(1995). 

[ 4] Koontn E- V . Zhou S., I.ucc.hesis J C Nucleic Acids Res. 23:4229-4233(1 995). 

is [04033 118. citratejsynt 

Citrate synthase (EC 4.1.3.71 (CS) is the tricarboxylic acid cycle enzyme that catalyzes the synthesis of citrate from 
oxaloaoetate and acetyl- CoA in an aldol condensation CS can directly form a carbon -carbon bond in the absence of 
metal ion cofactors. 

[0404] In prokaryotes, citrate synthase is composed of six identical subunits. In eukaryotes. there are two isozymes 
so of citrate synthase 1 one is found in the mitochondrial matrix, the second is cytoplasmic Both seem to be dimers of 
identical chains. 

[0405] There are a number of regions of sequence similarity between prokaryotic and eukaryotic citrate synthases. 
One of the best conserved contains a histidine which is one of three residues shown [1] to be involved in the catalytic 
mechanism of the vertebrate mitochondrial enzyme This region has be&n used as a signature pattern 

- Consensus pattern: G-[FYAHGA]-H-x-f iV]-x(1 ,2)-[RKT]-xf2)-D-fPS)-R [H is an active site residue] 

[0406] [1] Karpusas M : Branchaud B., Remington S.J. Biochemistry 29 2213-2219(1990). 
[0407] 119 clpA_B 
30 Chaperon in dp A/ 8 

CAUTION! This family is a subfamily of the AAA superfamily. The threshold has been set very high to stop overlaps 
with the AAA superfamily. This entry will be subsumed by AAA in the future. 
Number of members: 39 

[0408] A number of ATP-binding proteins that are are thought to protect cells from extreme stress by controlling the 
35 aggregation of denatu ration of vital cellular structures have been shown [1 2] to be evolutionary related These proteins 
are listed below. 

Escherichia coli clpA, which acts as the regulatory subunit of the ATP-dependent protease dp. 
Rhodopseudomonas blastica clpA homolog, 
40 - Escherichia colt heat shock ptotein clpB and homologs in other bacteria. 
Bacillus subtilis protein mecB. 

Yeast heat shock protein 104 (gene HSP104). which is vital for tolerance to heat, ethanol and other stresses 
Weurospora heat shock protein hsp98. 
Yeast mitochondrial heat shoch protein 78 (gene HSP/8) (3j. 
45 - CD4A and CD4b, two highly related tomato proteins that seem to be: located in the chioroplast. 
Trypanosoma brucei protein clp. 
Porphyra purpurea chioroplast encoded cipC 

[0409] The size of these proteins range from 84 Kd (clpA; to slightly more than 100 Kd (HSP104). They all share 
so two conserved regions of about 200 amino acids that each contains an ATP-binding site In addition to the ATP-binding 
A and B motifs there are many parts in these two domains that are also conserved. Two of these regions have been 
selected as signature patterns The first signature is located in the first domain, some ten residues to the C-tenmnal 
of the ATP-binding 6 motif. The second pattern is located in the second domain in-between the ATP-binding A and B 
motifs. 

ss 

- Consensus pattern: D-{A!HSGA]-N-[LIVMFK2)-K-[PT3-x-L-x{2j-G 

- Consensus pattern: R-{L!VMFY3-D-x-S-E-[LIVMFY3-x-E-[KRQ3-x-[STA]-x-[STAHKRHL!VM3-x-G-[STA3 
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[ ij Gottesman S . Squires C . Pichersky E , Carrington M . Hobbs M . Matttck J S . Dalrymple B . Kuramitsu H 
ShirozaT., Foster T.. Clark W. P.. Ross B., Squires C.L, Maurizi M.R. Proc. Natl. Acad. Set. U.S.A. 87:3513-3517 
(1990). 

[ 2] Parse!! D A . Sanchez Y.. Stitzel J D.. Lindquist 3 Nature 353.270-273(1991 ). 
s { Z) Leonhardt S A . Fearon K . Danese P.N . Mason T.L. Mo!. Cell Biol 13 6304-6313(1993). 

[0410] 120. cofilin_ADF 
Cofilm/tropomyosin-type actin-binding proteins 

10 Medline: 97290449 

Structure determination of yeast cofiiin. 

Fedorov AA. Lappaiamen P. Fedorov EV, Orubtn DG, A! mo SC; 

Nat Struct Biol 1997:4:366-369. 

[2] 

*s Medline: 97290450 

Crystal structure of the actm-binding protein actophorin from Acanthamoeba, 
Leonard SA. Gittis AG. Petrel la EC. Pollard TD, Lattman 
Nat Struct Biol 1997,4 369-373 
[31 

20 Medline: 97420794 

F-actin and G-actin binding are uncoupled by mutation of conserved tyrosine residues in mace acttn depolymetizing 
factor. 

Jiang CJ, Weeds AG, Khan S, Hussey PJ; 
Proc Natl Acad Sei USA 1997;94 9973-9978 

25 [4] 

Medline: 97387155 

Cofiiin promotes rapid actm filament turnover in vivo. 
Lappaiatnen P, Drubm DG; 
Nature 1997;388:78-82. 
30 Severs actm filaments and binds to actm monomers. 
Number of members: 44 

[0411] Actin-depciymerizing proteins sever actm filaments < F-actmi and/or bind to actin monomers, or G-actin, thus 
preventing aetm-poiymerization by sequestenng the monomers. The following proteins ate evolutionary related and 
belong to a family of low moleculai weight < 1 37 to 166 residues) actin-<iepclymenzing. proteins [ t. 2.3.4] 

35 

Cofiiin from vertebrates, slime mold and yeast Cofiiin binds to F-actin and acts as a pH-dependent actin-depo- 
lyrnenzing protein. 

Destrin from vertebrates. Desttin binds to G-actin in a pH-independent manner and presents polymerization 
Caenorhabditis eiegans unc-60. 
40 . Acanthamoeba castellanii actophonn. 

Plants actm depoiyrnerizing factor (ADF). 



[0412] The most conserved region of these proteins is a twenty amino-acid segment that ends some 30 residues 
from their C-termina! extremity This segment has been shown [5] to be important for actm -binding. 

45 

- Consensus pattern: P-{DE]-x-[SA3-x4LIVMTHKR3-x4KR]-M--ELIVMHYAHSTA](3)-x(3HLIVMFHKR3 

[ 1] Hawkins M . Pope B , Maclver S K . Weeds A G Biochemistry 32 9985-9993(1993) 
[23 lida K , Monyama K. Maisumoio S . Kawasaki H.. Nishida E , Y'ahara I. Gene 124 115-120(1993) 
so [ 33 Quirk S. . Maclver S.K., Ampe C, Ooberstein S.K.. Kaiser D.A., van Damme J., Vandekerckhove J.. Pollard T. 

D Biochemistry 32 8525-8S33i 1 963) 

[4] McKim K.S., Matheson C„ Marra MA, Wakarchuk M.F., Baillte D.L Mol. Gen. Genet, 242:346-357(1994), 
[53 Monyama K.. Vonezawa N„ Sakai H.. Vahara I., Nishida E. J. Biol. Chem. 267:7240-7244(1992). 

ss [0413] 121 (Complex 24kdt Respiratory-chain IMADH dehydrogenase 24 Kd subunit signature Respiratory-chain 
NADH dehydrogenase (EC 1.6.5.3} [1,23 ( aiso known as complex! or NADH-ubiquinone oxidoreductasei is an oligo- 
mers enzymatic complex located in the inner mitochondrial membrane which also seems to exist mthe chioropiast and 
in cyanohactena (as a NADH-plastoquinone oxidoreduetase) Among the- 25 to 30 polypeptide subunrts ot this bioen- 
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etgettc enzyme complex there is one wttn a molecuiarweightot.M Kd \,m mammals ) which is a component ot the iron- 
sulfur i IP) fragment of the enzyme. It seems to bind a2Fe-2S iron-sulfur cluster. The 24 Kd subumt is nuclear encoded, 
as aprecursor form with a transit peptide in mammals, and tn Neurospora crassa.The 24 Kd subumt is highlv similar 
to [3 4] - Subumt E of Es:henc hta toll N^DH-iibtqiiinoneoytdfteductosefgen^ nm'E) -Suonnit NG02 ot Paiaioictts 
s oenitnficans NAOH tibictumone o*idoreauct3sc A highly conserved region located in the central section of this subumt 
containing two >.onsen/ed <.\stemes that ate proh.tbl\ involved in the binding of the 2f-V2S centei has been selected 
as a signature pattern. 

- Consensus pattern D-\ t 2 )■? ■(&'!] v^i-Cl. G-\-C-\/ : t [GA] p | fhe two C 's are putative ?f : e-.?S irgand^ 



[ 1j Ragan C.!. Curt Top. Bioenerg. 15: 1-38(1 9871 

[2]WeibsH FftedrKhT HtfhausG Fm>0 Eur J Biochem 1^7 5G3-5 - G<1£9l} 

[3j Fearnlev Walker J.E Biochim. Biophys Acta 1140:105-134(1992! 

[ 43 Weidnet U Geier S Ptoch ^ FnedrichT LeifH Weiss H J Mo! Biol 233 100-122(19955 

?5 

[0414] 122. copper-bind 

Copper binding proteins, plastocyanin/azurin family 
Number of members: 70 

[0415] Blue or 'type-1' copper proteins are small proteins which bind a single copper atom and which are character- 
so ized by an intense electronic absorption bend near 600 nm [1 ,2] The most well known members of this class of proteins 
are the plant chloroplastic piastocyanins. which exchange electrons with cytochrome c6. and the distantly related bac- 
teria! azunns, which exchange electrons with cytochrome c-551. This family of proteins aiso includes all the proteins 
listed below (references are only provided for recently determined sequences) 

25 - Amicyanin from bacteria such as Methylobacterium extorquens or Thiobaciiius verst.itt.is that can grow on methyl- 
ami ne Amicyanin appears to be an electron receptor for methyiamme dehydrogenase. 

Auracyamns A and B fromChiorof!e>.us awantiacus [3]. These proteins can donate elections lo cytochrome c-554. 
Blue copper protein from Aicaligeries faecalis 
Cupredoxm (CPC) from cucumber peelings [4], 
30 - Cusacyanin (basic blue protein: plantacyanin, CBPJfrom cucumber. 

Halocyanin from Natrobactenum pharaonis [5]. a membrane associated copper-binding protein 
Pseudoazunn from Pseudomonas. 

Rusticyanin from Thiobaciiius ferrooxidans. Rusticyanin is an electron earner from cytochrome c-552 to the a-type 
oxidase 

35 - Stellacyanin from the Japanese lacquer tree. 
Umecyanm from horseradish roots. 

Allergen R33 from ragweed. This pollen protein is evolutionary related to the above proteins, but seems to have 
lost the ability to bind copper 

[0416] Although there is an appreciable amount of divergence in the sequence of all these proteins, the copper iigand 
sites are conserved and a pattern which includes two of the ligands (a cysteine and a histtdine} has been developed 

- Consensus pattern: [GAj-xfO^HYSAj-x^O. 1 )-|VFY]-x-C-x( 1 ,2HP<3|-x{0. 1 )-H-x{2 r 4)-[MQ] [C and H are copper tig- 
4S ands] 

[ 1 3 Garret TP J., Clingeleffer D.J , Guss J.M , Rogers S.J., Freeman H C J Biol. Chem. 259. 2822-282 5(1984) 
[ 23 Ryden L G , Hunt L T J Mol Evol. 36'41 -66(1 993) 

[ 3] McManus J.D., Same D.C., Han J., Sanders-Loehr J., Meyer IE., Cusanovich MA, Toliin G., Sfankenship R, 
so E. J. Biol. Chem. 267:6531-6540(1992). 

[ 4] Mann K. : SchaeferW, Thoenes U . Messerschmidt A , Mehrabian 7. . hialbandyan R. RvBS Lett. 314:220-223 
(1992). 

[ 53 Mattar S. ; Scharf B„ Kent S.B.H. ; Rodewald K„ Oesterhelt D , Engelhard M. J, Biol. Chem. 269:14939-14945 
(1994). 

ss [63 Yano T, Fukumon Y., Yamanaka T. FEBS Lett 288:159-162(1991) 

[0417] 123. Chaperonins cpnlO signature 

Chaperonins [1.2] are proteins involved in the folding of proteins or the assembly of oligomene protein complexes. 
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Tney seem to assist other pob 3 peptides in maintaining or assuming conformations wnicn permit then correct assembly 
into oligoineiK atfu^tuieb Th^y aie found in abundant in pro!>arvoteb, chloiojjasfc, and mitodKndna Ch^peiontns 
fom oligom^iiL tornf. and : ne eompcstd cf two different t>f.es of subuntls J ^0 Kd ptofetri known : ib 0f.n6U 
{gtoEL tn buctena) and a 10 Kd protein (mown ascr.n10 fgroES in oactena} The cpn 10 protein binds to cr.n60 in th** 
s presence of MgATf and suppresses the ATPase activity of tho latter Cpn10 is a protein of about 100 amino acid 
residues whose sequence is well -xmerved tn twtetra verk-br.jte mito>.fk>ndti3and plants chlort plast (3 4[ CpntO 
iss> j mbies ;<s an h> j pi imer th it forms .3 domi^] As t< signature pattern for <-pn10 a region located in thu N-termmal 
section of the protein was selected. 

Consensus pattern: [LiVMFY]"X-P-(!LT3-x-[DEN3-(KRHL!VMFA3(3V[KREQ)~x(8.9HSG]"X-[L!VMFY3(3)~ 
to Note: this pattern is found twice tn the plant ehioroplast protein which consist ot the tandem repeat of a cpn10 domain 

[ 13 Ellis R.J.. van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 
[ .?3 ,:.eiista Rvalls l FjyetO Geoigopoulos C Annu Rev Mictobioi 46 30 I A?5< 1991 i 
[ 33 Hartman D J Hnogenraad N J Conomn R Hoj PB Proc Nat! Arao Sci US^ 89 3394-? 3 98i 1^92) 
*5 [4jBertschU Solid 3eetnaram R Viitanen PV Proc Natl Acad Sci USA 89 8090-8700(1992' 

[ 5] Hunt J.F., Weaver A.J. . Landry S.J.. Gieraseh L. Deisenhofer J. Nature 379:37-45(1996). 

[0418] 124 Chaperontns cpn60 signature icpn60_TCP1 1 

Chaperonms 31. 2] are proteins involved m the folding of proteins or the assembly ot oligomenc protein complexes, 
so Th^ir tole tetiiTrts k be to assist othei \ olyptf tide 1 , to mainlatn 01 assume (.onfamalions whtoh permit then coireof 
assembly into oligomenc structures. Thev are found m abundance in prokaryotes chloroplasts and mitochondria. Chap- 
fctomns form oligomenc completes and are competed of two different Upes. of subunits a cO Kd piotein known as 
cpn60 (groEL m bacteria) and a 10 Kd protein, known as cpn10 tgroES m bacteria ).The cpn60 protein shows weak 
ATPase ai. Ik &<, md is ;< highly cun^efvud ptutttn ot about 550 to 580 amino acid residues which h*s b<= > j n described 
25 by different names in different species: - Escherichia coli groEL protein, which is essential for the growth of the bacteria 
and the assembly ot several bacteriophages, - Cyanobactenal groEL analogues. - Mycobacterium tuberculosis and 
leprae 65 Kd anitgen, oieiia burneiti heal ^ho«.k protein B (gene htpBt RieMt^ia tsufo'tgamus-hi niaioi antigen 58 
and Chlamydia! 57 Kd hypersensitivity antigen igene hypB) -Chloroplast RuBisCO subunit binding-piotein alpha and 
beta chains, which bind ribulose bisphosphate carboxylase small and large subunits and are implicated in the assembly 
30 of the enzyme oligomer. - Mammalian mitochondrial matrix protein P1 (mitonin or P60). - Yeast HSP80 protein, a 
mitochondria! assembly factor As a signature pattern for these proteins, a rather well-conserved region of twelve 
residues, located in the last third of the cpn60sequence was chosen. 
Consensus pattern: A-[AS]-x-[OEQ3-E-x(4)-G-G-[GA]- 

35 [ 13 Ellis R.J., van der Vies S.M. Annu. Rev. Biochem. 80:321-347(1931). 

[23 Zeilsta-Ryaiis J., Fayet 0., Georgopaufos C, Annu. Rev Microbiol, 45:301-325(1991). 

[0419] Cnaperonins TCP- 1 signatures <cpn60_TCP1 ) 

The TCP-1 protein [1 2\ 5 Tailless Comple* Polypeptide 1) ^as first identified m mice where it is especially abundant in 
40 testis but present in all cell types. It has since been found and characterized in many other mammalian species, in 
Diosophila and in y^ast TCP-1 is a highly conserve \ rotem of atxul ^0 Kd (b">6 to 560 lesidu^i,) which participates 
in a hetero-oligomenc900 M double-torus shaped p-jtiick* |3] with 6 to 8 otner different subunits These subunits, the 
chaperonm containing TCP-1 (CCTt subunit beta, gamma, delta, epsilon, seta and eta are evolutionary related to TCP- 
1 ik,elf j4 i] The CC ! is Knovwi k jet as .t moieculai >.h.jpeione kt tubulin aarn and pababiy some other proteins 
45 [0420] The COT subunits hiqhly r^lat^d to fichuback ti 1! counterparts - TF"5 and TF5o [C]. a molecular chap- 
erone from Sultolobus shibatae. TF55 has ATPase activity: is known to bind unfolded polypeptides and forms a oligo- 
menc complt-v of tv*o staged nine-mt-mbeied nngs Thermosome ['/] fmm 'I heimopiasma acidophilum The ther- 
mosomf* is i. omf. o^^d of twe ^.ubunitb uilf.ha arid b^tat artd also seems to be a chaperon^ with ATPase activity it 
forms an oligomenc complex of eight-membered rings The TCP-1 family of proteins are weakly but significantly [&}. 
so related to thecpn60/groEL chaperonm family (see <PDOC00268>).As signature patterns of this family of chaperonms, 
thtee i-onserved tegmii kctted in the N terminal ootn.jm wete chosen 
Consensus pattern' [R£&LHSri-x-[LMFYj-G-P-y-EGSA]-v-v-K-ELIMvlFl(2}- 

Consensus pattern: ELtVMHTSHNK3-D^GAHAVNHKHTAVHLiVM3(2Vx{2HLiVM)-x-{L!VM3-x-ESNHHPQH3- 
Consensus pattern: Q-[DEK3-x-x-[LIVMGTA]-EGA]-D-G-T- 

ss 

[ 13 Ellis J, Nature 358:191-192(1992), 

E 2] Nt-lson R J Ciaig F A Cuir Biol > 487-480. 10<O 

E Lt;v\is \ A Hynes G M Zheng O Saibil H Willtson K R Natuit; o58 240-25^(19921 
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[4jMibotaH Hynes G Came A Ashworth A Wiiiison h R Curr Biol 4 89-t9t W94) 

[ 5] Kim S, Wtllison K.R.. Norwich A.L Trends Biochem. Sci. 20:543-548(1994). 

[ 8] Trent J.D.. Nimmesgsm E.. Wall J.S.. Haiti F.U.. Norwich A.L. Nature 354:490^593(1991 i. 

[/jWaldmannT LupasA kelleimann f PetetsJ BaumeistetW Biol C hem Hoppe-S^iei 376 11 9-126f 1995; 

[Sj Hemmlngsen S.M. Nature 357:650-850(1992). 

[0421] 125. eye! in fCvclins} 

The cyciins include an internal duplication, which is related to that found m TF118 and the RB protein 
[1] 

Medline: 94203808 

E<. icteric- 1 for a protein domain supt*tf iinilv shared by the- ^.clins 
TF1IB and RB/p107. 
Gibson 7 J. 7hompsort JD. Blocker A, houzarides 7; 
Nucleic Acids Res 1994:22:946-952. 

Medline: 96164440 
The crystal structure of cychn A 
Btov.nNR Noble MEM Endtcott JA Garman EF Vvakatsul i S 
Mitchell E. Rasmussen B, Hunt T. Johnson UN: 
Structure. 1995:3:1235-1247. 
Complex of cychn and cyclin dependant kinase. 
[3] 

Medline: 96313126 

Structural basis of cyclin-dependant kinase activation by phosphorylation. 
Russo AA, Jeffrey PD, Pavleticn NP: 
Nat Struct Biol. 1 996:3:696-700. 
C>clins, (emulate eydm dependant kinases {CCKs-'i 

Thi* most cikergtint ptosit- 1 members h kc- been inducted 3'Viss, P220~4 the Urani-DNA glyoosyl ise 2 is thfi highest 
n?be and may be t elated hut has not been included 
Number of members: 189 

[0422] Cyciins [1 2 3] on- eul-un •jti; pmtems which pLiy on active tole in (.entitling nuclear rel! wiston cycles. 
Cyciins. together with the p34 tcdc2t or cdk2 kinases, form the Maturation Promoting Factor tMPPt. There are two 
mam groups of cyciins: 

G2 M cyciins e siuntial for the control o! the cull ovcte at tht G2/M (mitosis) tiansition G2 U cyi. iins accumulate 
steadily during 02 and are abruptly' destroyed as cells exit from mitosis (at the end of the M-phaset 
G1/S ovelins -issenfial kt the oontrc I of the c^ll cycle : it the '■< l/S <starl} transition 

[0423] In most species there ate multiple forms of G1 and G2 cyciins Fot e^amole in vertebrates there are two 

G2v.v"lms AandB and at It-a^t three G! c\clins C D and E 

[0424] A cvclin homolog h : is also be-in found in htipesvirus ^aimm [4] 

[0425] Tne best consei^T legion is in tne :enttal part of the lycltns sequences known as the eye fin-box". From 
this, a 32 residue pattern has been derived. 

- Consensus pattern: R-xf2HUVMSA3-x(2HFYWSHLIV^ 
F Y Q]-x-[ L I VM F Y C]- [ L I VM F Y]-D -[R K N ]-[ L i V M F Y Wj 

[1] Nurse; P. Nature 344:503-508(1990'!. 

[2]NorburyC Nurse P Curr Biol ! 2c -24(1901) 

[ 3] Lew D.J., Reed SJ. Trends Cell Biol. 2:77-81(1992). 

[ 4] Nkholai J C^meion K R Honeys R V\ Natuie ,^5 362- 366(1 992 1 

[04263 1 26. Cvstatm domain 

This is a very diverse family. Attempts to define separate subfamilies have failed. Typically, etther the W-termma! or C- 
termmal end is very ciivurgtsnf But splittino into two domains <vould in;<i>t; vury short f imtlie s Cath- J liciains are relate.d 
to this family but have not been included. Number of members: 147 

[0427] Inhibttois oto>t.te!ne [.mteasesj"! ? 3| ^hkh aie found in the tissues a.nd Lody fluids of a.nimal« in the larva 
of the worm Orn.hoct;iea volvulus [4] as wfl : is in plants can be giouf ecf ink thr-ie distinct but filiated families 
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Type i cystatins (or stefms). molecules of about 100 amino acid residues with neither disulfide bonds nor carbo- 
hydrate groups. 

Type 2 cyslatins molecules of about 115 amino acid residues which contain one- or two disulfide- loops near their 
C-terrntnus. 

Kininogens. which are multifunctional plasma glycoproteins 

[0428] They are the precursor of the active peptide brad y Km in and pij> a role in blood coagulation by helping to 
position optimally prekallikrein and factor XI next to factor ,x 1 1 They are also inhibitors of cysteine proteases Structurally 
kininogens are made of three contiguous type-:? cystatin domains, followed by an additional domain (of variable length) 
which contains the sequence of bradykmm The first of the three cystalin domains teems to have lost its inhibitory 
activity. 

[0429] In all these inhibitors there is a conserved tegion of five residues which has been proposed to be important 
for the binding to the cysteine proteases. The consensus pattern starts one tesidue before this conserved region 

- Consensus pattern: JGSTEQKRVJ-Q- [L!VTHVAFHSAGQ]-G-x-[L!VMNK3-xf2HLlVMFY)-x-[L!VMFYAHDEN- 
QKRHSIV] 

[1] Barrett A.J. Trends Biochem. Sci. 12:193-196(1987). 
[2] Pawlmgs H D . Barrett *v J J Mo! Evol 30 60-7K 1990) 
[3] Turk V.. Bode W. FE.8S Lett. 285:213-219(1991). 

[4] Lustigman S., Brotman B.. Huima T, Prince A.M. Mol, Biochem, Parasitol. 45:65-76(19911 

[0430] 127 cytochtome_c (Cytochiome c) 

The Pfam entry does not include all pros it e members 

The cytochrome 558 and cytochrome c' families are not included. 
Number of members: 259 

[0431] In proteins belonging to cytochrome c family [1] i he heme gfoup is oovalently attached hv thioethef bonds to 
two conserved cysteine residues The consensus sequence for this site is Cys-X->'-Gys-His and the histidine residue 
is one of the two axial ligancfs of the heme iron This arrangement is shaied by all proteins known to belong to cyto- 
chrome c family which presently includes cytochromes c. c', d to c6, c550 to e556 cc3/Hrnc cytochrome f and reaction 
center cytochrome c. 

- Consensus pattern: C-{CPVWFHCPVVR}-C-H-{CFYW} 

[0432] [1] Mathews FS Prog Biophys Mo! Biol 45 1-56(1985) 

[0433] 128. (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

[0434] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activate? This domain is assumed 
to be an accessory domain: its function is unknown. 

[0435] [1] Sakane F. ramada K Hanoh H Yokoyarna C Tanabe T Nature 1990,344 345-348 {2} Sakane F. imai S. 
Kat M Wada I, Kanoh H, J Biol t hem 1996,271 8394-8401 [3j Schaap D, de Widt J van det Wal J, Vandekerclmove 
J van, Damme J, Gussow D, Plcegh HL van Blitterswijk WJ van der, Bend RL. FEBS Lett 1990,275 151-158 [4] 
Kanoh H, Yamada K. Sakane F, Trends Biochem Sci 1990;15:47-50. 
[0436] i:?9. tDAGKc) Diacylglycerol kinase catalytic domain (presumed! 

[0437] Diacyiglyceiol (DAG) is a second messenger that acts as a protein kinase € activator The catalytic domain 
is assumed ffom the finding of bacterial hornologues 

[0438] [1] Sakane F, Yamada !• Kanoh H Yokoyarna C. Tanabe T, Nature 1990.344 345-348 [2] Sakane F. Imai S 
Kat M Wada I. Kanoh H. J Bioi Chem 1 996"? 7 1 8394-8401 |3j Schaap D de Widt J. van der Wal J. VandekercHiove 
J van, Damme J, Gussovv D, Ploegh HL \an Biitterswijk WJ van der, Bend RL, FEBS Lett 1990,275 151-158 [4] 
Kanoh H Yamada K, Sakane F, Trends Biochem Set 1990.15 47-50 
[0439] 130 D-amino acid oxidases signature(DAO) 

[0440] D-amino acid oxidase itvC .1.4.3.3) iDAMOx or D&O) is an FAD flavoenzyme that catalyzes the oxidation of 
neuttal and basic D-amino acids into their corresponding keto acids DAOs have been characterized and sequenced 
in fungi and vertebrates where they are known to be located in the peroxisomes. D -aspartate oxidase (EC 1 4.3.1) 
(DASOX) [1] is an enzyme, structurally related to DAO. which catalyzes the same reaction but is active only toward 
dicarboxylie D-amino acids. In DAO, a conserved histidine has been shown [2] to be important for the enzyme's catalytic 
activity. The conserved region around this residue has been developed as a signature pattern for these enzymes. 
[0441] Consensus pattern [IJVM3(2)-H-fNHA3-Y-G-x-[GSA](2)-y-G-x(5)-G-x-A [H is a probable active site residue]o- 
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[ 1] Negri A., Cecillam R. Tedescht G., Simonic T., Ronchf S. J. Biol. Chem. 287:11865-11871(1992). 

[ 2] Miyano M., Fukui K, Watanabe F., Takahashi S„ Tada M., Kanashiro M., Miyake Y. J, Biochem. 109:171-177 

(1991). 

s [0442] 1 31 DEAD and DEAH box families ATP-dependent helicases signatures 

A number of euharyotic and prokaryotio proteins have been characterized [1,2,3] on the basis of their structural simi- 
larity. They alt seem to be involved in ATP-dependent, nucleic-acid unwinding. Proteins currently known to belong to 
this family are: - Initiation factor elF-4A. Found in eukaryotes, this protein is a subunit of a high molecular weight 
complex involved in 5'cap recognition and the binding of mRNA to ribosomes It Is an ATP-dependent RNA-helicase, 

w - PRP5 and PRP28 These yeast proteins are involved in various ATP-requinng steps of the pre-mRNA splicing process 

- PI10 : a mouse protein expressed specifically during spermatogenesis. - An3. a Xenopus putative RNA helicase. 
closely related to PI10. - SPP81/DED1 and DBP1. two yeast proteins probably involved in pre-mRNA splicing and 
related to PH0. - Caenorhabditis elegans helicase glh-1 . - MSS116, a yeast protein required for mitochondrial splicing. 

- SPB4. a yeast protein involved in the maturation of 253 ribosomal RNA, - p88. a human nuclear antigen p68 has 
»5 ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. - Rm62 (p62>. a Drosophila 

putative RNA helicase related to p68. -DBP2, a yeast protein related to p88. - OHH1, a yeast protein. - DRS1, a yeast 
protein involved in ribosome assembly. ■• MAK5, a yeast protein involved m maintenance of dsRNA killer piasmid ■■ 
ROK1 , a yeast protein. - stel3 ; a fission yeast protein - Vasa. a Drosophila protein important for oocyte formation and 
specification of embryonic posterior structures, - Me31 B. a Drosophila maternally expressed protein of unknown func- 

20 tion. - dbpA. an Escherichia coil putative RNA helicase - deaD. an Escherichia colt putative RNA helicase which can 
suppress a mutation in the rpsB gene for ribosomal protein 82 - rhlB, an Escherichia coli putative RNA helicase - 
rh!E, an Escherichia coli putative RNA helicase - srmB, an Escherichia coif protein that shows RNA-dependent ATPase 
activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1, 
ZK512.2 and ZK686 2 - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w - Fission yeast 

25 hypothetical protein SpAC31A2.Q7c. - Bacillus subtilis hypothetical protein yxiN. Ail these proteins share a number of 
conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 
proteins or by proteins belonging to the heiicases 'supeifamily' [4.E1J One of these motifs, called Ihe 'D-E-A-D-box'. 
repr> j s> j nis a special version of the B motti of ATP-bmding proteins Some othti proteins belong to a subfamily which 
ha\ e His instead of tne second Asp and are thus said to be 'D-E-A-H-bo^' proteins [3 o 6 Ei j Proteins currently known 

30 k telonj k this subfamily ate -FRP2 PRP16 PRP22 and PRF43 I best- yeast golems ate dill involved in various 
ATP-reqt wing steps otthe pie-mRNA splicing piotess - Fission yeast prhl whuh my oe involved in pte-mRN A splicing 

- Male-less <mlei a Drosophila protein required m males for dosage compensation of > chromosome linked genes. - 
RAC3 fain yeast PAD 3 is a DMA hekase tn\ol\ed tn elusion ftpan of DNA damaged by UV light bulky adducts or 
ctoss-linkiny .to,ents Fist ion \east radfj t rhp?) and irummalian DN& e>cfst<.n lepair ptotetn aPD iERCC-2) are the 

35 homology of RAD3 - V ist CHL1 tut CTF1 j Minn is important fot ohtomosoniu liansmis^ion and normal cell cycle 
progression in G(2)/M, - feast TPS1. - Yeast hypothetical protein YKLQ78w, •• Caenorhabditis elegans hypothetical 
( rotems*" 08EI 1U and l\0oH1 2 - Po (viruses' eaily transaction faciei ~U Kd subunitvvhtoh at ts\Mth RNA polymerase 
tn initiate transcription ftom eaily g^ne piomoWs - 18 a £ut3tiv<- vaccinf3 virus nelicas^ - htpA an Escherichia coli 
putative RNA helicase Signatuie patterns foi ooth subfamilies ^ere developed 

40 [0443] Consensus pattern [LIVMF3.'2hD-E-6-D-[RKEN]-N-[LI\ MFH^TN 
Consensus pattern: [GSAMj-x-[LiVMF3{3)-D-E-[AL!V>H-[NECRj 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif A (P-!oop) (see the relevant 
entry <PDOC00017 

45 [ij Schmid S.R.. Under P. Mot Microbiol. 8:283-2921992). 

[23 Under P., LaskoP. AshburnerM.. Leroy P., Nielsen P.J. , Nishi K.. SchnterJ., Slonimski P.P. Nature 337:121-122 
(1988). 

[ 3] Wassarman D A Stettz J A Natuie 349 403-464(1 991 ) 
[4] Hodgman TC Nature 333 22-23(1933} ano Nature 332 £78-3"'&t1983j t Errata' 
so [ 5] Harosh I.. Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991 ). 

[ o3 kootiin f: V 5>enkevfch I'G J Gen viiol " ? 3 <j8P^9?< I9PI ) 

[0444] 1 32 iDH8P„synthasej 3 4-aihydroxy-2-butanone ^-phosphate s\ nthase 

[0445] o 4-Dihydiuvy-z-butanune 4-phosphate is t ios> nthesintd from nbulose 5-phosphafe and serves as the bio- 
55 s^ nthetic ptecursot for the xylene ring of riboflavin Sometimes tound is a Afunctional tm^ymt; vi tth GJPj^y.cJoh^_dj;o2._ 
[0446] RichterG KiiegerC Volk R his h Ritz H Gotee E BacherA Metnods Enz\ mol i0u7 2^0:374-382 
[0447] 13o t'DHnPS) Dih\dK dipttolinate ->yntheta->e signatures 

Cihydiodif icolmate syniheUst- (EC A z ! 52i (DHDPS) [13 eata!y^i : 'S intiight;! plants chloroplasl and in many bactena 
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(.gene dap A) the fitbt reaction bpecitic to tne biosynthesis of lysine ana of aiammoptmelate DHOPS is responsible for 
the _x ndi-n^aton of aspartate stmialdfehyde and pytivate L> a|.ina.-pona. mev.hanibm in whkh pvaw^tt first Lmds k 
the enzyme by forming a Schiff-hsst; with a iybirn* residue Thr^e utrnv piofeirib ?sie strut {ui : illy related to DHDP 1 -" and 
probably also act via a similar catalytic mechanism: - Escherichia coll N-acetytneuraminate Ivase (EC 4,1,3,3) (qene 

s nanAs which catalyses the condensation of N acct\l D mannosamine and pyruvate to form N-ace-hlne-uramtnate - 
Rhizobtum meiiloti protein mosA|3| which is involved in the biosynthesis of the rhizopine 3-o-meihyl-scyllQ-inosamine. 
- EsorVrf-hia ^oli hypothetical pruk in sjhH Two siqn iktre patterns fui these enzymes 'Vtifc developed Tht- first un« 
is centered on highly conserved region in the N-termmal part of these proteins. The second signature contains a lysine 
lestdue y.hroh ha*, been shown in kschenclua coli dapA to be the one {hat fomii. a bchilt base with the substrate 

10 [0448] Consensus (attorn [GSAHLiyMHLiVMFVl-nf^VG-E^THTGj-G-E-IGA^NFj-xflKEQ]- 

Consensus pattern: Y-[DNSHUVMF : A]-P-x(2HST3-xf3HLiVMG3-x{13. 14MUVMJ- x-fSGAHLlVMFJ-K-JDEQAF]- 
[STaC] [K is involved in Schitf-base formationj- 

[1] haneko T Hasnimoto T humpaisal R Yamada V J Biol Chem 26 c > 1745 t-174G?i IW) 
»5 [2j LaberB., Gomrs-Ruetb F.-X., Romao M.J,, HuberR. Biochem. J. 288:891-695(1992). 

[ 3] Murphy P.J., Trenz S.P.. Grzemski W„ de Bruijn F.J.. Scheil J. J. Bacterial. 175:5193-5204 (1993). 

[0449] 134 t.DHOdehasei Dih\droorotate dehydiogenase signatures 

Dihydroorotate dehydrogenase (EC 1.3.3.1) (DHOdehase) catalyzes the fourth step in the de novo biosynthesis or 

so pyrimidrne Hu :i (.cnv^ion of dihydioaokik- into oruksk CHOd^hsst: is a ubiquitous FAC fkivopruttin In baett;iia 
(.gene pyiD) DHOdeose is located on th^ inner sid^ <'f the cytosohc membrane In some y^a^ts suih as in S-jJuh-j- 
romyces cerevtsiae (gene UPAn it is a cytosohc nrotein while in other eukai votes it is found in the mitochondtia [1] 
The b^qu^nce ot DHOdeabe is rather v\^ll _x nerved =ind tw*. bignaturt [.attftna v,<=re dt-wlapKl sp^ufk to this, t-n- 
zynm Tht- first corresponds to a region in th> j N-lennirial section of the enzyme while the second is looak d in th> j C- 

25 terminal section and seems to be part of the FAD-bmdmg domain. 

Consensus patteinjG^i-A(4)-|GK|-jG^rAi-|UVf-\-rfAHGri^(3t [NORJ- <-G [NH\ |-\f :> P-[RT [ 
[0450] Consensus pa{^fn^Llv'^1]{2)^GSA3-<-G-G-^l\'|->-^STOCN]-^(3HACv']-v; t e. , ^-C-A 
[0451] [1]NaayM Laoroute- F Thomas D Proc Natl -^oad Su UbA o9 8&fi^3"0(1992. 
[0452] 1 3C (OMRL_syntnase> 0 7 -dimetnyl-e-tibityiii imagine synthase 

30 [0453] 13n .DNAjnethvlase} C-b >-ykbin*-s pacific ON A methylases signatures, 

C-^cytosinH- i ,pe:ifi;DNAmethylai 5 ei 5 (EC2 I I < 3WC5Mtabe) ore enzvmes that specifically methvlotethe C-5iaibon 
of cytosines in DMA [1,^.3]. Such enzymes are found in the proteins described below. - As a component of type I! 
restriction-niodifkation b\btenib in prokaryotes and borne b^teiiophages &uch <=nz\rrtes recognize a specific DNA 
i equence where the\ trvth\ l.j{e a cytokine In domes so, they piotect ON A fa m olea^ au<r bv tvpe II testuction enzymes 

35 i\yA r^coyni^t; thi* ^amt- st-qut-nc^ Th« ^uqut-nc^s of a large nnmbf t of type I! C-5 Mta^e 1 , ar^ known - In vurk bral^b 
thert- aie a numhei ot C-5 Mtases that meth>late CpG dmuoleotides The sequence of the mammalian enzynv is 
Known C-S MtaiOi sharf 1 a number c f s{u->it c<Dns.«rs'e d regions Twc of thein wer<» iolsxt^d Ttn 1 fit sat is. center >id around 
a ennt^rosd Pio-Cys dipeptid^ in 'vhicn th<- "vstein-* h^s be^n ihoo n [4] to involved in the catalytr mechanism it 
appeatb to form a covaient intermediate ^ith the CO position of cytosme The secono region is iocateo at the C-tenmnal 

40 extremity in type-ll enzymes 

[0454] Consensus patitin [L)ENhSj->t-[FLlV3-<{2)-(GS fO]->-H-> -<{2V[F /VvLIMJ-S (C it, tht : iotiw sit-i rtiSidnej- 
Consensus pattern: [RKQGTF3-xf2)-G-N-[STAGHLiVMF3-xf3HLiVMT3-xf3HLiVM}-xf3HLlVM]- 

[ 1 ] Poi f.ji J Bhaov, rtt A >? Roberts R J Gene -'4 ?o 1 ,?6 3t 1 P«8 1 
4S [2]Kuinat3 Cheng A Mimat,au«;kat, S Mi S Po=;faiJ Pub«rt=; R I Wilson G G Nudiiii Ands R^s 22 1-10 

(1994). 

[SjLausterR "frautnei'fA Uoy* r Weidnei M J Mot ^iol 706 3 1 2( 1 985- ) 

[4] Chen L McMillan A M Chang W Ezok-Ntpl-ay K Lane WS \enine G L Biochemists 30 1 10 18-11025 
t'1991). 

[0455] 1 3i i DNAphok lyase) DNA r. hotolyases olass 2 signatures: 

Deo<ynbodipynmidine photolyase (EC 4 1 99 3; iDNA photolyase' (1 2j lis <j DNAi^pair enzyme It binds to UV-iam- 
aged DNA containing pynmidine dimersand. upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobutane 
ring |Oiriing tht two pyrimidines. of Hu :i dinit-i CjNa \ he k lyast; i 1 . an e nz> m? that inquire b two chore mophoro-cuffjetorb 
ss for its ;<rtMty a r^dur^o FADH2 md cith-»r 5 10-nmthf n\ Ik tra hydro fol*lt! (5 10-MTFH) or an o^idiz.=>d 8-hsdtoyv- 
o-deazaflavin tb-HDFj derivative t.F420j The folate or dea;afla\in chtomophore apnears to function as an antenna 
while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence similarities 
[3] DNA f.hotolva'.e 1 . cars b<* grouped into {wo da 1 .^^ The second class oontairis enzymes fium M>xococcui, >anthus 
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methanogenic archaebactena, insects, fish and marsupial mammals ft fs not yet known what second cofactor is bound 
k dass ^ enzymes Thete lie a numbet of con^rved s^ottem,^ nitons in all knoom Uai.s 2 DNAj. hotolvas.es., espe- 
cially in the u-teiminal \ art Two oMhese legions v\t;ie h<*\&(. t^d a 1 , signature patterns 
Consensus, pattern F->-E-E-y-[LI\ M]<2)-R-R-€-L-<{2)-N-F- 
s Consensus pattern: G-x-H-D-x(2V-W-x-E-R-x-[L!VM]-F-G-K-[LiVM]-R-[FY]-M-N- 

[ 1] 'iancai G B Sane ir A Trends Bhichum 12 2^3-2C1{1«e"t 
[2] Jorns M.S. Biofactors 2:207-211(19901 

[3j7asmA, K-kej A PM /asuhiia S vajtma H l-obayashi'f Takao M Oil- awa A E- MbO J 13 6143-C1M 

10 (19941 

[0456] iTNAptHklva^ei) DMA photolyases class 1 sinnatuies 

Deoxynbodipynmidme phototyase t EC 4.1.99.3) (DNA photoiyase) [1,2] is a DNA repair enzvme. It binds to UV-dam- 
aged DNA containing pynmioine dimers ano upon absotbing a near-Uv pnoton i 300 to 500 nrm" oieul-s the cycbbu- 

»5 tane ring joining the two pynmidines of the dimer. DNA phototyase is an enzyme that requires two choromophore- 
cotactors for its activity: a reduced FADH2and either 5. 10-methenvltetrahydrofoiate (5.10-MTFHl or an oxidized 8-hv- 
droo/-&-dea;raflavin (8-HDF) denvati^ e J : 420) The fol.jfe oi dfrazafia^in chiomophoKr .tppe^rs to function .js an ..in- 
tenna while the FADFC chromophoie is thougnt to be responsible for electron transfet On the basis of sequence 
similarities^] DNA photolyases tan he grouped into two classes The fnst<.lai.s contain::, en^vmei. fiom Gi am- negative 

so and oiam-positivt; bacfafia the hak \ hik arrhaebacttna HaktbaiJ^num hakbiuin funqi and plant 1 . Class 1 enzvme 4 . 
bind either 5,10-MTHF lE.coli. fungi, etc.1 or S-HDF (S.griseus. H.halobtum),Thts family also includes Arabidopsis 
cryptochi oiTifcS 1 (CRY i i and ^ (CRY21 which are blue light photoreceptor that mediate blue lignt-induced gene ev- 
ptession There at<= a miinb^r oHonswved ^quen.^ legions in all knovm class 1 DN^ photoKases especially in the 
C-tetminal part Tv.ii ol these regions viere selected as signature patterns 

25 [0457] Consensus pattern: T-G-x-P-[UVMpi-D-A-x-M-[RA]-x-[UVM]:- 

Consensus pattern; [DN]-R~x~R~[LtVM]{2)-x-[STA3(2)-F-[LiVMFA3-x-K-x-L-x(2 1 3V W-[KRQ]- 

[ 1] 'iancar G B Sane ir A Trends Bhichum Sci 12 2^9-2C1(1«e"t 
[2] Joins W S biofactO!s2 20--2tH1t'90> 
30 [ c>] Yasui A ti<eiAPM YasuhiM S Vajima H Koba>ashif lakao M Oikavva A EtVIBO J 1i^14VMfcl 

(1994). 

[4] Lin C, Ahmad M.. Cash more A.R. Plant J. 10:693-902(1996). 

[0458] 138 (DMA., pof... A) 

35 DNA polymerase family A signature 

Replicative DfvJA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the accurate replication of DNA They 
require.' either a small RNA molecule or a protein as a primer for the de novo synthesis of a DNA chain On the basis 
of sequence similarities a number of DNA polymerases h3ve been grouped together [1,2,3] under the designation of 
DNA polymerase family A, The polymerases that belong to this family are listed below. 

Escherichia colt and various othet bacterial polymerase I (gene polA} 
Thermus aquaticus Taq polymerase. 
Bacteriophage spOi polymerase. 
Bacteriophage sp02 polymerase. 
45 - Bacteriophage T5 polymerase. 

Bacteriophage T7 polymerase. 

Mycobacteriophage Lb" polymerase. 

Yeast mitochondrial polymerase gamma i'gene MIP1). 

50 [0459] Five regions of similarity are found in all the above polymerases. One of these conserved regions, known as 
'motifB'fl], is located in a domain which, in Escherichia coli polA, has been shown to bind deoxynucleotide triphosphate 
substrates; it contains a conserved tyrosine which has been shown, by photo- affinity labelling, to be in the active site, 
a conserved lysine, also part of this motif, can be chemically labelled, using pyridoxal phosphate. This conserved region 
was used as a signature for this family of DNA polymerases. 

ss [0460] Consensus pa{ternR-Ki2)-[GSAV]-K-x(3t-[L!VMFY]-[.AGQ3-x(2i-Y-x(2t-[GS3-x(3i-[L!VMA] Sequences known 
to belong to this class detected by the pattern ALL. 

[ 1] Deiarue M., Poch O., Todra M., Moras D., Argos P. Protein Eng. 3:461-467(1990). 
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[2] ito J., Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3] Braithwaite D.K.. ito J. Nucleic Acids Res. 21:787-802(1993). 

[0461] 139. DNA_po!_virai_C 

DNA polymerase (viral) C-termma! domain 

Number of members: 128 

[0462] 140 (QNAJopoisoli) 

DNA topoisomerase !! signature 

DNA topoisomerase i f£vC 9.99.1.2} [1,2,3,4 £A] is one of the two types of enzyme that catalyze the interconversiort 
of topological DMA isomers. Type I! topoisomerases -ate- ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase I! is found in phages, archaebacteria, prokaryotes, eukaryotes, and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase I! consists of three subunits {the product of 
genes 39, 52 and 60) in prokaryotes and in arohae bacteria the enzyme, known as DNAgyrase. consists of two subunits 
(genes gyrA and gyrB [E2]) in some bacteria, a second type i! topoisomerase has been identified, it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
in eukaryotes, type I! topoisomerase is a homodimer. 

[0463] There are many regions of sequence homology between the different subtypes of topoisomerase II. The 

relation between the different subunits is shown in the following representation. 



--About- 1 400-residues-- 



[ Protein 39-*-— ][ — Protein 52 — j Phage T4 

{ ---gyrB * ]( gyrA -] Prokaryote II 

Archaebacteria 

[ parE * )[ parD— j Prokaryote IV 

[ * — ) Eukaryote and 

ASF 

'*': Position of the pattern. 



[0464] As a sign^ttite f attw n foi tnis family of ptotems a tegion trwt contains * highlv conserved pentapeptide was 
seiecteo The pattern is located in gyiB in patE and in piotem 3C ot phage T4 toooisomerase 
40 [0465] Consensus pattein[UV M6j-v-E-G-[DN]-^-A-x-[STA^] ^qu«n,es hiown to belong to tin* class detected L \ 
the pattern ALL. 



[ 1) Sternglans R. Curr. Opin Cell Biol, 1:533-535(1990). 
[2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 
[3]PhirmaA Monofagon A Our Opin brnioi Biui 5i«-47M99"i 
[4j Roca J Trends Bi^chem Sci '10 loo~i60< \ c OC) 



[0466] !41 (DSFrT Tyi^ine ^pe:ttic pi?tetn phospnatoses signature ,.md ptofiies 

T\rostne specific protein phosphatases (EC 2 i 2 4St (PTPaset [1 to E] are enzymes that catal^e the removal of a 
■io ptu senate gioup attached to a t\ rosins itsidue These enzymes are veiv important in the control of tell groUh pio- 
lifeiation differentiation .tnd tianstoimation Multiple fom is t f PTPase h.ue t een ch^rjcteiced andean be classified 
into two categories soluble PI Pases ana transmembrane leceptot proteins that contain PTPose iomoin(s1 !he cur- 
rently known PTPases are listed below: Soluble PTPases. - PTPN1 (PTP-161 - PTPN2 (T-cel! PTPase: TC-PTPl - 
PTPf-H t H1 1 and PTPN4 t MEG) tnzymts thai contain an N-f-irmmal band 4 1- like domain ts-ie ^FCOC00566>} and 
ss could act: at junctions between the membrane and cytoskeleton. - PTPN5 (STEP). - PTPN8 (PTP-1C: HCP: SHPt and 
PTPN11 (PTP-2C; SH-PTPJ; Syp\ enzymes which contain two copies or the SH2 domain at its N-terminai extremity: 
The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. - PTPN7 (LC-PTP; Hematopoietic protem- 
tyfosino phosphatase- HePTPt - PTPN8 (70Z-PEP). - PTPN9 (MEG2) - PTPN12 (PTP-G1 PTP-P19) - i'eas1 PTP1 
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-Yeast PTP2 which may be invoked in the ubiquitin-mediated protein degradation pathway - Fission yeast pyp1 and 
pyp2 which play a rote in inhibiting the onset of mitosis - Fission yeast pyp3 which contributes to the dephosphorylatton 
of cde2 -Yeasi CDC14 which may be involved in chromosome segregation - Yersinia virulence piasmid PTPAses (gene 
yopH) - Autographs califomica nuclear polyhedrosis vims 19Kd PTPase Dual specificity PTPases -DUSP1 (PTPN 10. 
MAP kinase phosphatase- 1. MKP-U which dephosphorylates MAP kinase on both Thr-183 and Tyr-185 - DUSP2 
iP&C-1>. a nuclear enzyme that dephosphorylates MAP kinases ERK1 and i-:.R!\2 on both Thr and Tyr residues ■ 
DUSP3 (VHR). - DUSP4 (HVH2V - DUSP5 (HVH3). - DUSP6 (Pystl ; MKP-3). - DUSP7 (Pyst2; MKP-X). - Yeast MSGS, 
a PTPase that dephosphorylates MAP kinase FUS3 - Yeast YVH1 - Vaccinia virus H1 PTPase; a dual specificity 
phosphatase Receptor PTPase?. Structurally, all known receptor PTPases, are made up of a variable length extra- 
's cellular domain, followed by a transmembrane region and a C-jerminalcata lytic cytoplasmic domain Some of Hie re- 
ceptor PTPases contain fibroneetfntype III tFN-il!) repeats, immunoglobultn-like domains, MAM domains orcarbonic 
anhydrase-like domains in their exttacelkilar region The cytoplasmic region generally contains two copies of the PT- 
PAse domain The first seems to base enzymatic activity, while the second is inactive but seems, to affect substtate 
specificity of the first In these domains the catalytic cysteine is geneially conserved but some other presumably 
*s important, residues are not In the following table, the domain structure of known receptor PTPases is shown: Extra- 
cellular intracellular Ig FM-3 CAH MAM PTPaseLeukocyte common antigen (LCA) f CD45i 0 

2 0 0 2Leukocyte antigen related (LAR) 3 8 0 0 2 Drosophila DI.AR 3 9 0 0 ZDrosophila DPTP 2 2 0 0 2PTP-aipha 
(LRP! 0 0 0 0 2PTP-beta 0 16 0 0 1PTP-gamma 0 110 2PTP-delta <W0 0 2 PTP-epsilon 0 0 0 0 2PTP-kappa 1 4 
0 ! 2PTP-mu 14 0! 2PTP-zeta 0 M 0 2PTFase domains consist of about 300 ammo acids There are two conserved 
so cysteines, the second one has been shown to be absolutely required for adiMty Furthermore, a number of conserved 
residues in its immediate vicinity have also been shown to be important A signature pattern fot PTPase domains was 
denied centered on the active site cysteine There are three profiles for PTPases. the first one spans the complete 
domain and is not specific to any subtype The second piofiie is specific to dual-specificity PTPases and the third one 
to the PTP subfamily 

25 [0467] Consensus pattern: [LIVMF}.H-C-x(2)-0-x(3HSTCHSTAGP)-x-[LiVMFY3 is the active site residue]- 



[ 1] Fischer E H , Charbonneau H . Tonks N K Science 253:401-406.1991 > 
[2j Charbonneau H , Tonks H K Annu Rev Cell Biol 8 463-493(1992. 
[ 3j Trowbridge I S J Biol Chem 266 23517-23520(1991 ) 
[4] Tonks N K Charbonneau H Trends Biochem Sci 14-497-500(1989) 
[ 5] Hunter T, Cell 58:1013-1016(1989). 



[0468] 142 tDUFIOi Uncharactenzed protein family UPF0076 signature 

The following uncharacterized proteins have been shown [1 ] to shate legions of similanties. ■ Goat antigen UK114, a 
35 human homolog and the rat corresponding protein which is known as perchloric acid soluble protein (PSP1 j PSP1 [23 
may inhibit an initiation stage of cell-free protein synthesis. ■ Mouse heat-responsive protein HRSP12 ■ Yeast chro- 
mosome V hypothetical protein YER057c. - Yeast chromosome IX hypothetical protein YlL051c - Caenorhabditis el- 
egans hypothetical piotein C23G10 2 - Eschenchia coil hypothetical piotein ycdK - Eschenchia coil hypothetical pio- 
tein yhaR - Escherichia coli hypothetical protein yigF and H10719. the corresponding Haemophilus influenzae protein 
40 - Escherichia coli hypothetical protein yoaB - Bacillus subtilis hypothetical protein yabJ. - Haemophilus influenzae 
hypothetical protein 1-111627 - Helicobacter pylori hypothetical protein HP0944 - Lacfoooocus lactis aidR - Myxococcus 
xanthus dfrA. - Synechocystis strain PCC 6803 hypothetical protein s!r0709. - Rhizobmm strain NGR234 symbiotic 
plasmid hypothetical protein y4sK ■■ Pyroooocus honkoshn hypothetical protein PH0854.'!'hese are small proteins of 
around 15 Kd whose sequence is highly conseh'ed As a signatuie pattern, a well conseh'ed region located in the C- 
■*s terminal part of these proteins was selected 

[0469] Consensus pattern [PA]-[ASTPV3-R-[SACVF3~Y-[LIVMFv]-.<t21-[GSAKR3-Y-[LMVA3-> fl 5.85-[LiVM]-E-[Ml]- 

[ 1] Bairoch A. Unpublished observations (1995), 

[23 Oka T„ Tsuji H.. NodaC, Sakai K., Hong Y-IVL Suzuki I., Munoz S., Natori Y. J. . Big j, Chem. 270.: 30060-30067 
so (1995). 

[0470] 143 iDUF3)Domain of Unhnown Function 3 
Domain apparently occurring exclusively in eubactena. Unknown function. 
[0471] 144. (DUF6) Integral membrane protein 
55 [0472] This famih includes man;, hypothetical membrane proteins of unknown function Many of the proteins contain 
fcvo copies of the aligned region. 
[0473] 145. (DUF7) integral membrane protein 

[0474] This family includes many hypothelical membrane proteins of unknown function Swiss P14502 has been 
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implicated in resistance to ethidium bromide 

[0475] 146. iDapB) Dihydrodipicolinate reductase signature 

Dihydrodipieolinate reductase (EC 1 3 1.26) catalyses the second step in the biosynthesis of diaminopimelic acid and 
lysine, the NAD or NADP-dependent reduction of 2,3-dthydrodipicolinate into 2,3,4.5-tetrahydrodipicolinate This en- 
zyme is present in bacteria {gene dapB) and higher plants. As a signature pattern the best conserved region in this 
enzyme was, selected. It is located in the central section and is part of the substrate-binding region [1]. 
[0476] Consensus pattern. E-[IV]-x-E~x~H~x(3r-K-x-D-x-P-S~G-T-A- 
[0477] [ 1] Sea pin G , Blanc hard J S Sacchettim J C Biochemistry 34 o5CC~r^2< 1995) 
[0478] 1 47. DedA family 

[0479] This Tamilv oc mbinc- s the DedA elated pioteins and >'lAN/YGlh family Members of this family ait not func- 
tionally characterised These proteins contain multiple piedieted transmeinbt me regions 
[0480] 148 CenT Dm J/ErvC1 ^tib family 

[0481] The memters of this family exhibit seme charac teristics of the sensor protein of two -component sign.il trans- 
duction svstems hcMe^et none of the mwnbets sho i v any sequence similarity to these f totem hnases Tne memters 
of this family do have the typical belix-turn-helix motif of DNA binding proteins. 
[0482] [!] Stufcrnan-Engv/alf KJ Often SL Hutchinson CR J Bactenol 1<«2 1 14 144- ! 54 
[0483] 14^ iDesatutase) f-'.jttv ..icid cie* atur^ses signatures 

Fatt\ acid desatuiases (EC 1 14 90 are enzymes that catalyze the insertion ot a double bond at the delta position 
ot fatty a^ids Theie seems to be two distinct families of fatty acid desatuiases which do not seem to be evolutionan 
related Famil> ! is composed of - °>tc- aioyl-f" >v> desaunase (SCD ) < EC 1 14 09 51 [1] SCO is a kev regulatory enzyme 
of unsatuiated fattv acid oiosvntnesis SCD inttoduces a cis nouble b^nd at th*- deltoiO) position of fatty acyl-Co-^'s 
such as palmitoleoyl- and oleoyl-CoA SCO is a membtane-bound enzyme tnat is thougnt to function as a part of a 
muftienzyme complex in the endoplasmic reticulum ot vertebrates and fungi. As a signature pattern for this family a 
cun^en-ed region in the C -tern una! part ot these enzymes vsas selected this mgion th rich in histidiru* (esidues and in 
aromatic residues. Family 2 is composed of: - Plants stearoy l-acy l-ca rner-p rotei n desaturase (EC I14.99.6i {2], these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oieoyi-ACR 
This en,r\me is responsible fvi the conve^ion of saturated fattv acids 1o unsatuiated fatty acids in the synthesis of 
vegetable oils. - C^anohactena desA [3] an enzyme that can mtioduc- 1 a second cis doubl- 1 bond at the J tit a12t 
position of fatty acid bound to membranes glycetolipids DesA is involved in chilling tolerance the pnase transition 
tc- mpeiature of lipids oi cellulai membranes being dc- pendent cn the dc- gree of unsaiuiafion cf fatty acids ofihe mem- 
brane lipids As a signahjie pattern for this family a conserved region in the C -terminal cart ot these enzvmes was 
selected. 

[0484] Consensus pattern G-E-*-[FV]-H-N-[F/]-H-H-v-F-P-n-D-Y- 

Consensus pattern; [STlH;SA}-x(3HQRH^I]-x(5.6V^Y-xf2HLlVMFrW}-[LIVMHDE}- 

[ 1] Kaestner K.H.. Ntambi J.M., Kelly T.J, Jr.. Lane M.D, J, Biol. Chem 264:14755-14761(1989). 
[ 2] °4wikliri J Some mile C R Fioc Natl Acad Sci U -> A 88 251 0-25 i 4(1 941 1 
[ 3] Wada H Gnmbos Z Nmata U Natute 347 200-203{ t<>90t 

[0485] 150. Dihydroorotase signatures 

Dihydioorota^e < EC abz S^DI-tOast:) catalyse situ 1 third hit p in the de novo bic svnfhc- sis of c ynmidine iheconveisic n 
of ureidosuccmic acid (N-carbamoyl-L-aspartatejinto dihydroo rotate. Dihydroorotase binds a zinc ton which is required 
for its catalytic activity jlj. In bacteria, DHOase is a dimerot identical chains of about 400 ammo-acid residues (gene 
pyrC). In higher eukaryotes. DHOase is part of a large multi-functional protein known as rudimentary m Drosophila 
and C\D in mammals and v\hich catah::esthe first three st> j ps of pyiimidiru* biosynthesis [2] The DHOas- 1 domain is, 
located in the central part of this polypi otein In yeasts DHOase is encoded by a inonofunctiona! piotein igene URA4i 
However a defective DHOa?e domain [3] found in a multifunctional protein fgene URA2ithat catalyzes the first two 
steps of pvnmtdme biosynthesis The comput tson of DHOase sequences from ^ a nous sou ices shosvs [4] thut there 
are two nig hi} conserved regions The first located in the N -terminal extremity contains two histidine residues suggested 
[o]to be involved in binding the zinc ion The s^ond is found in the C-termmal cart bicjnatuie cattem^ for both regie nb 
have been developed. Ailantoinase (EC 3.5.2.5) is the enzyme that hydrolyzes allantoin mfoallanfoate In yeast (gene 
DAL1 ) [5]. it is the first enzyme in the allanto indegradation pathway: in amphibians [6] and fish it catalyzes the second 
step in the degradation of uric acid. The sequence of allantoinase is evolutionary related to that of DHOases. 
[0486] Consensus pattern: D-[LlVMFYVVSAP]-H-[LlVA3-H-[LlVFHRN]->:-[PGANF3 [The two H's are probable zinc 
ligandsj- 

Consensus pattern. [GA]-{ST]-D-y-A-P-H-xi4}-K- 

[ 1] Brown D.C., Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
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[ 2] Davidson J N Chen K C Jamison R e Musrnanno L A Kern C B BioEssays 15 1 -3"-1 C4{ 1 0£'3 1 
[ 3] ^ouciet J -L , Nacjy M Gouai M Lauoute F Pottei S Gene 7^ 59-70^9391 
[4] Guyonvarch A Nquyen-Juilier^t M HutehJ-C LaoiouU F Md Gen Genet 212 134-141(19i«3i 
[ Buckholz P G Cooper TG Y^ast 7 91 3-92?{ lOOTt 
s [sjHajashiS Jam 3 Chu R AlvaresK Xu B Erfurth F Usuda N Rao MS Re-ddy S K NoguchiT Re-ddy 

J.K.. Yeidandi A.Y. J. Biol. Chem. 269:12269-12276(1994). 

[0487] 151. anaJ domains signatures and profits 

[0488] 'i he prokaiyotio heat i.hock protein dna J inteiacts with the chape rone hsp ; 0-ltke dnaK piotein [1 j Structural 
to ihe dn : iJ protein consists of an N- terminal conserved domain (called 'J' domain) cf aboul 70 ammo auds a glvcme- 
nch tegion t'G' domain") of about ^0 residues a central domain contamina font repeats of a C> XC* GAG motif ('CRR' 
domain) and 3 C-teimrnal region of t^0 to 170 residues Such a stiuctute is. shown in the totto^vrng schematic lepre- 
sentation: 

15 > M 

+ +" + +■ +■ + | N-terminai | [ 

Gly~R 1 1 CXXCXGXG j C-terimnai | + +- + ™....+™.-+ + .... 



[0489] It has been shown [2] that the ' J' domain as well as the 'CRR' domain are also found in other prokaryotic and 
eukaryotlc proteins which are listed below. 

a) Proteins containing both a "J' and a 'CRR' domain. 

Yeast protein MAS5/YDJ1 which seems to be involved in mitochondrial protein import. 
Yeast protein MDJ1, involved in mitochondrial biogenesis and protein folding . 
Yeast protein SCJ1, involved in protein sorting. 
Yeast protein XDJ1. 
30 . Plants dnaJ homologs (from leek and cucumber). 

Human HDJ2, a dnaJ homolog of unknown function. 
Yeast hypothetical protein YNL077w. 

a) Proteins containing a 'J'domain without a 'CRR' domain: 

35 

Rhizobium fredii nolC. a protein involved in cultivar-specific noctuiation of soybean, 
Escherichia coll cbpA [3], a protein that binds curved DMA 

Ye3st protein 3EC63/NPL1 . important for protein assembly into the endoplasmic reticulum and the nucleus. 
Yeast protein S1S1 , required for nuclear migration during mitosis. 
40 - Yeast protein CAJ1. 

- Yeast hypothetical protein YFR041c. 
Yeast hypothetical protein YIR004w 
Yeast hypothetical protein YJLl62c. 

Plasmodium falciparum ring-infected erythrocyte surface antigen f RESA). RESA. whose function is not known, 
45 is associated with the membrane skeleton of newly invaded erythrocytes. 

- Human HDJ1. 

- Human HSJ1, a neuronal protein, 
Drosophila cysteine-stnng protein (csp) 

50 [0490] A signature pattern for the 'J' domain was developed, based on conserved positions in the C-termmal half of 
this domain A pattern for the 'CRR' domain, based on the first two copies of that motif was also developed. A profile 
for the 'J' domain was also developed. 

[04913 Consensus pattern: [FY3-x{2HL!VMAl-x(3)-(FYWNTHDENQSA]-x-L-x-EDN]-x(3HKR]-x(2)-(FYl]- 
Consensus pattern C- [DEGSTHKR]--vX-*<^^ 

ss 

[IjCyrD.M., LangerT., Douglas M.G. Trends Biochem. Set 19:176-181(1994}. 

[2] Bork P , Sander C, Valencia A . Bukau B Trends Biochem Sci 17 129-129(1992). 

[3] Ueguchi C . Kaneda M.. Yamada K. Mi2uno T Proc Natl Acad Set U S.A 91 1054-1058(1994} 
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[0492] 152. 
[0493] 153. Dwarfin 

[0494] This family known as the dwarfins aiso includes the drosophila protein MAD The N-terrnmus of MAD can 
bind to DMA [2], 

s [0495] (1] Vtngiing M, Das P, Savage C t Zhang M. Padgett RW Wang XF. Proc Natl Acad Sci USA 1596:93. 
8940-8944. [2] Kim J, Johnson K. Chen HJ, Carroll S, Laughon A, Nature 1997;388:304-308. 
[0496] 154 Dynein light chain type 1 signature 

Dynem is a muftisubunit rnicratubule-dependent motor enzyme that acts as the force generating protein of eukaryotic 
cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the Intracellular retrograde motility of vesicles 

to and organelles along microtubules Dynem is. composed of a number of ATP-hmding large subunits. in to;! mediate size 
subunits and small subunits. Among the small subunits. there is a family [1 ,2] of highly conserved proteins which consist 
of - Chlamydomonas teinhardtii flagellar outer arm dynein 8 Kd and 11 Kd light chains. - Higher eukaryotes cytoplasmic 
dynein light chain 1 - Yeast cytoplasmic dynem light chain 1 {gene DYN2 orSl.01 ). ■■ Caencrhabditis elegans hypothet- 
ical dynein light chains M18.2andT28A5 9 These proteins are have from 89 to 120 ammo acids. As a signature pattern. 

*s A highly conserved region was selected. 

Consensus pattern: H-x-i-x-G-[KR]-x-F-[GA]-S-x-V-[ST]-[HY]-E - 

[ 1j King S.M.. Patei-King R.S. J,. Biol, .Ch em, .270:11445-11452(1.995), 
[2] DickT., Ray K., Saiz H.K.. Chia W. Mol. Ceil. Biol 18:1966-1977(1998). 

[0497] 155 dUTPase 

[0498] dUTPase hydrolyzes dUTP to dUMP and pyrophosphate. 

[0499] [1] Cedergren-Zeppezauer ES. Latsson G, Nyman PO, Dautet Z, Wilson KS, Nature 1992,355 740-743. [2] 
Mol CD : Harris JfVL Mcintosh EM, Tainer JA, Structure 1998;4:1077-1092. 

2& [0500] (dCMP cyt deam) Cytidine and deo*ycytidyiate deaminases zinc-binding region signature 

Cytidine deaminase (EC 3.5.4.5) (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into undine and am- 
monia white deoxyoytidyiatedeaminase iEC 3 5 4 12) (dCMP deaminase) hydroiyzes dCMP into dUMP. Both enzymes 
are known to bind zinc and to require it for their catalytic activity [1,2]. These two enzymes do not share any sequence 
similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought 

30 to be involved in the binding of the catalytic zincion. Such a region is also found in other proteins [3,4j: - Yeast cytosme 
deaminase (EC 3,5.4.1) (gene FCY1) which transforms cytosme into uracil, - Mammalian apolipoprotem B rnRNA 
editing protein, responsible for the postranscriptiona! editing of a CAA codon into a UAA (stop) codon in the APOB 
mRNA. - Riboflavin biosynthesis protein nbG. which converts 2,5-diamino-6- (ribosy!amino)-4(3H)-pyrimidinone 5'- 
phosphate into 5-amino-6-{ribosylamino)-2 l 4{1H,3H)-pyrimidm8dione S'-phosphate - Bacillus cereus biasticidin-S 

35 deaminase (EC 3 5 4.23), which catalyzes the deaminaiion of the cytcsine moiety of the antibiotics blasticidin S, cy- 
tomycin and acetyl blasticidin S. - Bacillus subtilis protein comEB. This protein is required for the binding and uptake 
of transforming DMA - Bacillus subtilis hypothetical protein yaaJ - Escherichia coii hypothetical protein yfhC - Yeast 
hypothetical protein YJL035c. A signature pattern for this zinc-binding region was derived. 

[0501] Consensus pattern' [CH3-[AGVj-E-x(2)-[LIVMFGATj-[LlVM)-y( 17.33t~P-C-x(2,8}-C-xt3HLIVM] [The C's and 
40 H are zinc ligands 

{ 1] Yang C, Cariow D., Wolfenden R. ; Short S.A, Biochemistry 31:4168-4174(1992), 

[2) Moore J.T. Silversmith R E.. MaleyG.F, Matey F. J. 8to! Chem 288.2288-2291(1993). 

[ 3] Reizer J., Buskirk S.. Bairoch A., Reizer A.. Saier M H. Jr Protein Sci. 3:853-856(1994) 

45 [4]Bhattacharya$. ( NavaratnamN., Morrison J. R., Scott: J., TaylowW.R. Trends Biochem. Sci. 19:105-106(1994), 

[0502] 157 Dehydrins signatures 

A number of proteins are produced by plants that experience water-stress Water-stress takes place when the water 
available to a plant falls below a critical level. The plant hormone abscisic acid (ABA) appears to modulate the response 
so of plant to water-stress. Proteins that are expressed during water-stress are called dehydrins [1,2] or LEA group 2 
proteins [3]. The proteins that belong to this family are listed below. 

- Arabidopsis thaliana XERO 1. XERG 2 (LTI30), RAB18, ERD10 (LTI45) ERD14 and COR47. 

- Barley dehydrins B8. B9, B17, and B18. 
ss - Cotton LEA protein D-11. 

Craterostigma plantagmeum dessication-related proteins A and B 

- Maize dehydnn M3 (RAB-1 7). 

- Pea dehydrins DHN1 , DHN2. and DHN3. 
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Radish LEA protein. 

- Rue pterins P^B 1GB ICC, loD RAB21, and RAB^S 

- Tomato TAS14. 

- Wheat lehydnn R-^B 1^ and iofd-saho:k r. totem roi4I0 <.s6Camcs!20 

s 

[0503] Dehydnn::, i.h.n e a numbet of stmctt.it a! battues. One of the most t iota We featuies is the pr?s?n<.e m then 
central teg ion of 3 continuous tun oifiv.* toninf- sennas tollovsed l<, i duster ofohtsroed residue Such a region h as 
been found in a!! known dehydrms so far with the exception or pea dehydnns A second conserved feature is the 
presence of two copies of aiysine-nch octapeptide; the first copy is located just after the cluster of charged residues 
to that follows the polv-senne region and irn» second coc> is kund al trn» C-leiminal e*tiemify Signature patltiins tor 
both regions were derived. 

[0504] Consensus pattern bt54DE]-x-[OE]-G-x< 1 ^j-G-xiO 1 }-[KR]^4 
Consensus pattern; [KRHL!M3-K-[DE]-K-[LIM|-P-G- 

*5 [1] Close T.J., Kortt A.A ; Chandler P.M. Plant Moi. Biol. 13:95-108(1989) 

[2] Robertson M.. Chandler P.M. Plant Mot. Biol. 19:1031-1044(1992). 

[3] Dure L. 111. Crouch M.. Harada J.. Ho T.-H D, Mundy J, Quatrano R.. Thomas T. Sung 2.R. Plant Moi. Biol. 
12:475-486(1989). 

20 [0505] 1S8 (deoRt Bacttiial re gul : itoiy protein*, de oR faintly signature 

Tne many bu^teiial ttanscnption tegulation pruteinss svhuh bind ONA through a helrv-tum-heltV trntif can fce classified 
into subfamilies on the basis of sequence similaiittes One of these suofamiiies gioups the following oroteinsj i 2) - 
accR the Ana bacterium tumefacifenb plasimd pTi> 58 lepie^or of opine cataLoiism and conjugal tnnbfw - agaP 
th> j Esfhe-tichia coli <sga opfion putative repn^sui - deoR the- Escherichia coil deuvynbo^ operon r^pte-ssor - lucP 

25 the Escherichia coh L-fucose operon activator. - gatR. the Escherichia coli galactitoi operon repressor; - glpR. the 
Eischertchta coliglyceioi-S-phosphate ieguion lepiessor gutR toi i.rlRi the Escherichia coii gluutol operon repressor 
-10IR from Bacillus- si'ib-tilR- -iacR the- slreptoc oc oi lack se- pho^phc Irans-fera^es-) stem repies^or -spolliC the Bacillus 
subtile transcription regulator of th> j sigK cprit; - vfjP an Escherichia coli hypothetical ptote-in -yghi in Escherichia 
coif hypothetical protein. - yihW. an Escherichia colt hypothetical protein. - yjfQ. an Escherichia coli hypothetical protein. 

J0 - yjhJ. an Escherichia colt hypothetical protein. I he 'heitx-turn-heitx' Di\IA-btndmg motif ot these proteins is located in 
the N-terminal putt ?f the sequen:e The pattern us>ed to detect these proteins, starts fourteen lesidues befoiethe HTH 
motif and ends one residue after it 

[0506] Consensus pattern: R-x{3HLIVM]-x(3HL!VM}-xn6.17>-[STA3-x(2l-T-[LIVMA3- [RH]-[KRMA}-D-[LIVMF]- 

35 [ 1 j ,on Bodm in <i l-tavman G T Faitand S Y Ptoc Natl Acad Sci U S A 89 642-647(1 oi)2t 

[2] Bairoch A. Unpublished observations (1993). 

[0507] 1 59. dsrm 
Double-stranded RNA binding motif 
40 [1] Burd uG. Dreytuss G: Medline: 94310455. Conserved structures and diversity of functions of RNA-bindmg proteins. 
Science 1994:266.616-621. 

[0508] Sequences oathwed t?r seed by HMMjt^tatr entraining Putative mottt shared by pmtHn^thatbimtodsPN^ 
At least some DSRM pioteins seem to bind to specific RNA targets t.xempiified by Staufen, vdiioh is iiwoivvd m 
loc^li^.jtu. n of dt le^st five diffetent mRNAt in the ?ariy Dioiorhila embryo Abo h\ interfei on- induced r. totem kinase 
45 in humans, which is part of the cellular response to dsRNA. 
[0509] Number of members: 116 
[05103 14 >0 Dvnaintn family signature 

Dynamin [1.23 !S a microtis bule-assoetated force-producing protein of 100 Kd which is involved in the production of 
microtubule bundles and which is able to oind and fydroK-e GTP Dynamin is structurally related to the following 
■io proteins - DiObophila shibue piot^in tgene shi} [3] Shibn*> is veiy ptobablv the Dic^ophila cognate of mammalian 
dynamin. It seems to provide the motor tor vesicular transport during endocytosis Yeast vacuolar sorttno protein 
VPS1 {or SP015) [4], a protein which could also be involved in microtubule-associated motility. - Yeast protein MGM1 
[5]. which is required for mitochondria! genome maintenance. - Yeast protein DNM1. which is involved m endocytosis. 

- Interferon induced Mx proteins [6.7], Interferon alpha or beta induce the synthesis of a family of closely related proteins, 
55 Most of these proteins are known to ccrift;! t^sistanc- 1 to inliLmr^a anises and/or ihabdoviruses on tt msfected mam- 

malian cell in culture. The three motifs found in all GTP-bmding proteins are located m the N-terminai part of these 
proteins. The signature pattern that ^as developed forth*^ |. jofeins !•> ba^^d on a highly conserved legion do><\ nt-ti eai n 
of the ATP/GTP-binding motif A' (P-ioopl (see <PDOC00017>) - 
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[0511] Consensus pattern L-P-[Rhj~G-j;STN3~[GN]-[LiVWj-V-T-R- 

[ 1)valkeRB "^hfjiMner H S Annu R*v Btuctwm 59 90<-j-9T?(19<-j0i 

[ Z] Ooor R A Collins C * Hammarback J A Snpetnw H S , \alk~? R B Nature 256-261 [ !990> 
s [3] van dor Biick A Meyerowitz E M Nature- 351 411-414(1991} 

[ 4j Rothman J.H., Raymond C.K . Gilbert T. O'Hara P.J . Stevens T.H. Cell 61:1063-1074(1890). 

[ 5] lories B A Fangm m WL G^s D«v 6 3o0-389 t 1992) 

[6] Arnheiter H.. Meier E. New Biol. 2 851-857(1990) 

[ ' j btaeht- h P Pitossi Y Pavlovic J 1'iends Cell Biol 0- 268 2" 2< N93i 

[0512] 161 idynainin_2) Dynainin central region 

[0513] This tenion lies betv^en the GTPase dunatn seedyn=imin and the jJeckstnn homology t PH| domain 
[0514] 1 62 E1-C-:2ArPasesphosphuyiationiit<r 

E1-E2 ATP3s.es, (also known as P-typel ..ire cation transport ATFases which form an ospartyl phosphate int>i mediate 
»5 m the course of ATP hydrolysis. ATPases which belong to this family are listed below [1,^,3], - Fungal and plant plasma 
membrane (H+) ATPases [reviewed tn 4j. - Vertebrate (Na+. K+) ATPases {sodium pump) [reviewed in 5. 6], -Gastric 
t!\f H i } AIT' .tses t r. toton pump) Calc mm t Ca f + ) A'f Pases 'cikium ft imp) from the saicoplasmic teticuium <oR) 
the endoplasmic teticulum iERi and the plasma membiane - Cooper (Cu++i ATPases tcopper pump) which are in- 
volved m two human genetic disorders: Menkes syndrome and Wilson disease [7], - Bacterial potassium (h+) ATPases. 
so - Bacterial cadmium -ifflut (Cd++) ATFas^s [teviewid in if] - Ba^nal magnesium iMg++t ATPases - A probable 
cation ATPase from Leishmanta, - fixf. a probable cation ATPase from Rhizobium mehfoti. involved in nitrogen fixation. 
Tne region around the phosphorylated aspartate residue is peitectly conserved in all tnese ATPases and can be used 
as a signature pattern. 

[0515] Cun^nsus pattern Q-h-T-G-T-[LI3-[TI] [D is phosphorylated] 

{ 1j Gie^n U M rVkt.ennan D H Biochem boc T.ans 17 819-822(1^89) 
[2] Green NM Ricchem So; Trans 17 970-972. !P8«! 
[3J Pagan M.J.: Saier M.H. Jr. J. Mol. Evol. 38: 57-99(1 994V 
[43 Serrano R Biochim Bioorns Acta y47 i-28{19e6t 
30 [ 5] Fambrough D.M. Trends Neurosei. 11:325-328(1988). 

[ 6] Sw^adner K J Biochim Etoph^s A:ta 988 18?-220(1989) 
[ ?] Bull PC, Cox D.W Trends Genet 10:246-251(1994), 

[ 8] Silver S.. Nucifora G. . Chu L, Misra T.K. Trends Biochem. Set. 14:76-80(1989). 

35 [0516] 163. E1JM 

E1 Protein, N terminal domain 
Number of members: 90 

[0517] 164. (E1_dehydrog) Dehydrogenase E1 component 

[0518] This family uses thiamine pyrophosphate as a cofactor This family includes pyruvate dehydrogenase. 2-ox- 
40 ogiutarate dehydrogenase and 2-oyoisovalerate dehydrogenase 
[0519] 165. (ECHt Enoyl-CoA hydratase/isomerase signature 

Enoyi-CoA hydratase (EC 4.2.1.17 ) (ECH) [1] and 3-2trans-enoy!-CoA isomerase(EC 5.3.3.6 ) (EC!) [2] are two en- 
zymes, involved in fatty acid metabolism. C--CH catalyzes the hydratation of 2-trans-enoyi-CoA into 3-hydroxyacyi-CoA 
and ECl shifts the 3- double bond of the intermediates of unsaturated fatty acid oxidation to the 2 -trans position Most 

■*s eukaryotic cells have two fatty-acid beta-oxidation systems, one located m mitochondria and the other in peroxisomes 
In mitochondria, ECH and ECl are separate yet structurally related monofunctional enzymes. Peroxisomes contain a 
trif '.motional enzyme [3] consisting of an N-terminal domain that bears both ECH and ECl activity, and a C-terminal 
domain responsible for 3-hydroxyacyl-CoA dehydrogenase (HCDH 5 activity In Escherichia coll (gene fadBland Pseu- 
domonas fragi (gene faoAj. ECH and ECi are also part of a multifunctional enzyme which contains both a HCDH and 

so a3-hydroxybutyry!-CoA epimerase domain [4j.A number of other proteins have been found to be evolutionary related 
to the ECH/ECl enzymes or domains" ■ 3- hydroxbutyryl-coa dehydratase «EC4 2 1 5S)(crotonase), a bacterial enzyme 
involved in the butyrate/butanoi-producing pathway - fiaphthoate synthase (EC 4 1 3 36) (DHNA synthetase) (gene 
menB) [5], a bacteria! enzyme involved in the biosynthesis of menaquinone (vitamin K2). DHNA synthetase converts 
G-suectnyl-ben2oyl-CoA (OSB-CoAi to 1,4-dihydroxy- 2-naphthoio acid (DHNA) - 4-chlorobenzoate dehalogenase 

ss (EC 3 S I 6) [63, a Pseudomonas enzyme which catalyzes the conversion of 4-chlorobenzoate-CoA lo 4-hydtoxyben- 
zoate-CoA - A Rhodobacter capsuiatus protein of unknown function (ORF257) [7]. - Bacillus subtiiis putative polyketide 
biosynthesis proteins pksH and pksl. - Escherichia coli carnitine racema.se (gene caiD) [83. - Escherichia col! hypothet- 
ical protein ygfG - Yeast hypothetical protein VDR036c As a signature pattern for these enzymes, a conserved legion 
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richin glycine and hydrophobic residues u\as selected 

[0520] Consensus pattern [LIVMHSTA]-x-ELIVMj-[0E(slCRHSTAl-G-x t oV[AG] t oV^4)-[U\ NiSTl-x-fGbTAj-fD'jHP]- 
[LIVMFY]- 

s { 1] MinamMshit N Takctani S Osumt T Hashimoto T Eur J Biochom 135 T3 TSi.1930) 

f 2j Muslim- New en G Stoffel W Biol Chem Hoppe-^eylet 3 - 2 61 3-^24(1991 i 

[ 3J Palosaari P.M.. Hiltunen J.K. J. Biol. Chem. 265:2448-2449(1990). 

[4] Nahahigashi K , Inckuchi H Nucleic Acids Res 18 41*37-4 037-, 1090 1 

[ e-j DitscollJ R "fahet H W J Bartend 1 ; 4 WW -60 ; in«t.?t 
10 [6] B-ibbtft KerijcnGL M : itm B M barest H "->y Ives he M Seholten J D Ch-mgK-H Liano. P-H 

Dun -v- iv-MananoD Brochemistn 31 5594-5o04( 19«2t 

[7j Beckrrtan D.L.. Kranz R.G. Gene 107:171-172(1991). 

[8]EichletK Bourqis f : feudist A hlebeiH-P Mrtndtand-E-erthelot M -A Mol Muk biol U< /T6- !9P4i 

*5 [0521 j 166 (EF1BD) Elongation factor 1 b<?ta/beta7d<?lta chain signatures 

Eukaryotic elongation factor 1 (EF-1 ) is responsible tor the GTP-dependent binding of aminoacyl-tRNAs to the nbos- 
omes [11, EF-1 is composed of four subunits: the alpha chain which binds GTP and ammoacyl-tRNAs. the gamma 
cnain that probably plays a role in anchonng the complevto other cellulai components and the beta and delta <or oeta) 
chains. The beta and delta chains are highly similar proteins that both stimulate the exchange ot GDP bound to the 

so alpha chain for t^TP [2] Tht beta and dulls chains ar^ h>dto' hilic piofems of siound 2^ to 31 Kd Their ""-terminal 
part s,eems important for tne nucleotide e^hange *jjrttvtty while the N-termina! section is pmbafcly m^oU'--T in the 
interaction with the gamma chain Two signature patterns for this family of proteins were de\ eloped The titst corre- 
sponds to an acidic region in the centra! section; the second, to the C-termmaf extremity ot these proteins 
[0522] Cun^nsus pattern [DE1-[DEG]-{DE](2)-[LIVMF]-D-L-F-G- 

25 Consensus pattern: [lV]-Q-S-x-0-[LIVM3-x-A-[FW!V!HNQ]-K-[LIVM]- 

f 1] Riiis B Rattan IS Clark BFC Menu k WC Trends, Bi< oh^m Su 15 42(.M24(1P9Qi 

[ 2j .-an Damme H T F -unions R Karssi.--s R Timmurs C J J mssen G M C Moelier W Biochun Brophys Acta 

1050:241-247(1990!. 

[0523] 1C~ iEF1G_domjin> Elongation factor t gamma ^n^erved domain 
[05 243 ] b$ t£FG„Cj Elongation factor G C-terminus 

[0525] This f-rmilv is akavs found associated v\ith GTPJEFTU This family includes the v.aibon'1 teimmal unions 
of Elongation factoi G elongation fat tor 2 and some Mra*. v;linv i ecstatic? pateim 
35 [0526] 160 (EFP* Elongation factor P sic-natur^ 

Elongation faotor P tti K -P> (1] ti. a piokaryotK. piotein translation fadoi lequired foi efficient peptide bond synthesis 
on ntosomts from fM.-'t-tRNAfM.-'t EF-P is a (.tottitn of 21 Kd It is evolution arv (elated to yeiP an hypothetical 
protein from Escherichia colt As a sionatui^ pattern ^ conserved legion lnc<at--o in thr> C -terminal r. 3rt of these pmt^mi 
was selected. 

40 [0527] Consensus pattern f--v-[A\, i-G-^2)-[LI\ ]-x-V-P-.(2i-[UV]-v(2>-G- 
[ 1] Aoki H. . Adams S.-L. Turner MA. Ganoza M.C. Biochtmte 79:7-11(1997). 
[05 283 1 "0 TS 1 Elongation factor Ts signatures 

In prokarvotes elongation factor Ts (EF-Ts) is a component of the elongation cycle of protein biosynthesis. It associates 
v\iththeEf : -"!u GDP complex and induces the exchange <. f GDP k G'fP ittem.tms round to the airnnoao,l-tRNA EF 
45 Tu GTP cunipliiv up to the GTP hydiolysis stag;* on the- nbosomt; [1] EF-Ts is also a component ot the ehloropi ist 
protein biosynthetic machinery and is encoded in the genome of some algal chloroplast [z]. it is also present in mito- 
chondria [3). As signature patterns for EF~Ts, two conserved regions located in the N-termina! part of the protein have 
been selected. 

[05 293 Consensus pattern I-R-v^-T-EGSDNQH-JGSHLIVMFJ-mO 1HDENKACH-K-[KRNE03]-VL- 
50 [05 303 Consensus pattern E -[ L I V M j -[ N V ]- [S C V ]- [> j E j -T-C »-F - \ -[ & A ]-[ K R N ]- 

[ 1] Bubunenko M.G.. Kireeva M.L., Gudkov A T. Biochtmte 74:419-425(19921 
[2] Kostrzewa M. Zetsche K. Plant Mol. Biol. 23:67-76(1993). 

[ 3] Xin H., Woriax V.L.. Burkhart W.A.. Spremulli L.L. J. Biol. Chem. 270:17243-17249(1995). 

ss 

[053 13 17t {£MP24__GP2?U emp24'go25L/p24 family 

[0532J Members, of thtt. faintly arf tinph-^tfd in bringing ^aigo tor\v-rid fioin th<r FR and binding to (.oat proteins b\ 
their cytoplasmic domains. Number of members: 30 
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[0533] Paccaud JP Thomas DY Bergeion JJ Nilsson T J Cell Biol 1993 140 "51-765 

17/. EMV..polvprotein 

ENV polyprotein (coat polyprotein) 

Number of members: ^24 

s [0534] 1 73 t£FG4_£KG24i Ergosterol biosynthesis ERG4/ER024 fa mil} signatures 

Two fung.ji pn.Tvmes involved in eryoihrtol bios\ nthpsis and which act by i educing double tends in pi pulsus, of 
t-rgost^rol h ne bes;n shown io be evolutionary elated [1 j The^e are C-14 sterol r> j duoi ise (gun- 1 ERG24 in budding 
> east and ergc? in Neurcspora CrassaT and G-24< 26 j sterol reductase (gene ERG4 in buadmg yeast and sts1 in fission 
yeast) 'I hen sequences are also highly ipiated to that of thicken lamm & ieceptor which i? thought to anchoi the 

to lamina to the inner nuclear membrane. These proteins are hiqhly hydrophobic, and seem to contain seven or eight 
transmembrane regions As sicmature p litems, tv^o consumed mgions v\t;is; selected The first one is apparently lo- 
cated in a loop between the fcurth and tilth tiansinembiane legions and the second is in the C-teimmal bectu. n 
[0535] Consensus p.jttetn G-xt^-[UVM3-EVHJ-D-x-lF^VJ->-G-xt^-L-N-P-P- 
Consensus pattern [L!vMKC)-H-P-x(2)-P-D->(^>-C-m'2WK-Y-G- 

*5 [ 1] Lai M.H.. Bard M.. Pierson C.A.. Alexander J.F.. Goebl M.. Carter G.T.. Kirsch D.R. Gene 140:41-49(1994). 
[0536] 1 V4 (ERMj Eznn ndKiivmoesin famil> 

[0537] 'Ihii family of protein*, contain a band 4 1 dom.jin (fS.jnd vi 4 1 ) ..it then ammo tetmmus This family repi events, 
the rest of these proteins. 

[0538] [1] >on^mu.a . Hiiao M Dot v Takahasht N, Kendo T, TsuHta 3 J Cell Biol 19-38 140 SeiS-t*^ 

so [0539] 175. ER lumen protein retaining receptor signatures 

Pri'teinss thut reside in tne lumen <'f the emoplasami; reticulum t'EP) contain aC-termmal tettapeptid^ (generally' K-D- 
E-L or H-D-E-Li that serves as a signal for their letrie^ai retrograde transport i Horn subsequent compartirients of the 
secretory pathway The signal fo recognised by a ie.:eptor molecule that i*. Lelie\ed to _"yc!e b^tv.e^n the cia side of 
th> j Golgi apparatus and th> j ER [1]This pterin is knov\n is the ER lumen protein tei lining receptor or ilso as the 

25 "KDEL receptor". It has been cbaractenzed in a variety of species, including fungi (gene ERD'A plants, Plasmodium, 
Drosophiia and mammals In mammals two highly related forms of the receptor are known, structurally, the receptor 
15- a protein of about /20 tsr-sidues that Sfems to contain seven transmembrane regions [2] The N-temunal [.art (3 
re^idu^s) th oriented tos^ard the lumen while the C-turmin i! tail tabuut 12 residues) is cytoplasmic There art; thiee 
lumenai and thiee cytoplasmic loops Two signatute patterns for tnese teceptots were developed Tne first pattern 

30 coirespondi, to tht C-iei initial half oi the first cytoplasmic loof. as well as most ui the second tiarismc mbrant; domain 
The second pattern is-j petfecth conserv-eo decapeptideth-.it :ot responds to tne central part of the fifth transmembrane 
domain. 

[0540] Consensus pattern G-l-S-N-[KR]-v-0^-L-[FY]-<-ELa3^(-F-xuVR->- 
Consensus p.jttem l.-E [SAjA -*\ 1-EI.Mf-P-Q-L 

35 

[ 1) Peiham H.R B. Curr Opin. Ceil Biol. 3:585-591(1991). 

E 2] Townsley F M Wilson O to Peiham H R B EMBO J \z zS2 l-282<-J t WQo) 

[0541] 1 75 ^£TF_betaj Election transfet flavoprotein beta-subunit signature 

40 The ^lection tianstei flavopatem tETFi E I 2] serves as a specific elation atxeptot for vaiious mitochondrial dehydio- 
genasts ETF trarisfers elections k tht main respiuktiy chain via ETF-ubiquinonc o>idciedui.fase EIF is an het- 
erodimerthat consist of an alpha and a bete subumt and which bind one molecule of FAD per dimer. A similar system 
also fV !S ,ts in some bacteria 'I he bt-ta subumt of E: T f : is a protein of about 28 Kd which is structurally i elated to the 
batten .t! nitiog en fixation piotem fi>>Av\hich could plav a role in .j ledox pioce^s .tndfeed electrons to fenedoon Other 

45 telated pioieins aie - Escherichia coli hypothetical protein ydtG - Escherichia coll hypothetical protein ygcR -is a 
signature pattern tor these proteins, a conserved region which is located in the central section was selected. 
[0542] Consensus pattern EIVAj-A-[KP|-\ t 2HDF0 [GDHGDEE-^(1,2t [EQj v \i \\>\. ^l-P-N-jl.lVMj^J-jTACj- 

E i] Finocchiaro G ll.eoa V ito M Tanaka b Prog Clin Biol Pes 321 637-052! 19^0 1 
so E^j^aiMH SaieiMH Ji Res Micabiol 14r 307-^10-1 < 1 0£'"5 } 

[0543] 1? Enri^nu^asae ill signatures 

Escherichia coli endonuclease HI (EC 4.2.99.18) (gene nth) [1] is a DNA repair enzyme that acts both as a DNA N- 
glycos>last; r-irnovinqo>udiz-idpyi!ni!diru : 'sfrotiiDNA and as an apunnic apyrimidimc iAP| tndc nuclease intioducmg 
ss i sinale-strand ntci- it the ^tie fiom vi tiich the damaged ba^c removed Endonuclease 111 is an tion-sulfut protein 
that binds a single 4Fe-4Sclustei The 4Fe-4S cluster does not seem to beimportantfoi catalytic act rut\ but is probably 
involved in the proper positioning of the enzyme along the DNA strand [2], Endonuclease 111 is evolutionary related to 
the fcllovving proteins - Fission yeast tindcniK lease 111 hcmoloq (gc ne nth!) [3] - Escherichia coli and telated protein 
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DNA repair protein mutY, which is an adenine glycosylase MutY is a larger protein (350 amino acids) than endonuclease 
ill (21 1 ammo acids i - Micrococcus luteu s ultraviolet N-gly cosy lase/AP lyase which initiates, repair at cis-syn pynmidine 
dimers - ORF'tO in plasrmd pFV'l of the thermophilic archaebactena EVIethanobacteriurn iherrnoformicieum [4] Reslnc- 
tion methylase m.MthTl *hich is encoded by this plasmid. generates 5-methy!cytosme which is subject to deammation 

* resulting in G-T mismatches This protein could correct these mismatches - rt-ast hypothetical protein YAL015c - 
Fission yeast hypothetical protein SpAC26A3.02. ■ Caenothabditis elegans hypothetical protein R1GE4.5. ■ Methane- 
coccus lannaschu hypothetical protein MJ0613 The 4Fe-4S duster is bound b;, four cysteines which are all located in 
a 17amino acid region at the G-terminal end of endonuclease ill A similar region is also present in the central section 
of mutY and m the C--termmus ot QRMOartd otthe Micrococcus UV endonuclease The 4l' : e-48 cluster region does 

10 not <s<isf in YAL015c Two signature patterns for these proteins, were developed the first corresponds to the core of 
the iron-sulfur binding domain the second corresponds to the best conserved region in the catalytic core of these 
enzymes. 

[0544] Consensus pattern C ■y(.34KRSj"P-[KRAGL}-C-<{:n-C^i5)-C [The four C's aie 4f : e-4S ligands]- 
Consensus pattern EGSTJ-y-EUVMF1-P-x(5HL!VMWH^ 
*5 [LIVMFYWHGANK]- 

E 1] Kuo C.--F McRee D.. Fisher C.L., O'Handley S F.. Cunnigham R.P., Tainei J A. Science "?8 434-440(1 992 1 
[2] Thomson A J. Curr. Biol. 3:173-174(19931 

[ 3] Roldan-Arjona T Anselmmo C , Lmdahl T Nucleic Acids Res 3307-33 t?(19&6> 
so [ 4j Noellmg J , van Eeden F J M Eggen R I L . do Vos W M Nucleic Acids Res. 20 8501-6507. 1992) 

[0545] 1 "8 (Epimerase ) NAD dependent epimerase/dehydratase family 

[0546] This family of proteins utilize NAD as a eofaetoi The pioteins in this family use nucleotide-sugar substiates 
for a variety of chemical reactions. 
25 [054?3 [1] Thoden JB. Hegeman AD, Wesenberg G, Chapeau MC, Frey PA, Holden HM. Biochemistry 1997;36: 
6294-6304. 

[0548] 179 Exonuclease 

[0549] This family includes a variety of exonuclease proteins, such as nbonucleaseTand the epsilon su burnt of DNA 
polymerase 111. 

30 [0550] [1] Koonin EV Deutscher MP Nucleic Acids Res 1993,21 2521 -2522 
[0551] 180. ENTH 
ENTH domain 

[0552] [1] Kay BK. Yamabhai M Wendland B. EmrSD Medline 9&156083. Identification of a novel domain shared 
by putative components of the endocytic and oytoskeletal machinery Ptotein Sci !999.8.43?-438 
35 [0553] The ENTH (Epsin N-terminal homology) domain is found in proteins involved in endocytosts and oytoskeletal 
machinery. The function of the ENTH domain is unknown. 
[0554] Number of members: 29 

[0555] 181 ie!F-1A) Euhaiyotic initiation factor 1A signatuie 

Eukaryotic translation initiation factor 1A(e!F-1A! [1] (formerly known aseiF-4C) is a protein that seems to be required 
■to for maximal rate of protein biosynthesis It enhances nbosome dissociation into subunits and stabiltzesthe binding of 
the mriiaior Met-tRNA io 40S ribosoma! subunits elF-IA is a hydrophiltc protein oi about 1? io 1 1 Kd Archaebaoieria 
also seem to possess a elF-1A homolog As a signature pattern a conseived region in the centra! section of these 
proteins was selected. 

[0556] C onsensus pattern flM}.* G-HGS|-jkRH]-M4HCL ]■ <-D-G k{?.)-R- < t 2)-fRHj-l- < G 
4S [0557] [1]WeiC-L hamumaM HersheyJWE J Bio! Chem 2~0 22788-22794(1 «3St 
[0558] i&l ^elF-oAl Eul.aryotic initiation factor CA hjpusine signature 

Eukaryotic initiation factor 5A (e!F~5A) (formerly known as e!F-4D) [1.2] is a small protein whose precise role in the 
initiation of protein synthesis is not known It appears to promote the formation of the first peptide bond e!F-5Asepms 
to be the onl\ eukar\otic protein to contain an h> pusme residue 1 ^ypustne is denied from lysine b> the post-transiationai 
so addition of a butyfammo group (from spermidine) to the epsilon-ammo group of lysine. The hvpusme group is essentia! 
k the function of vlF-f>AA hyp mine containing protein has been found in atchaebaiterij such a* ouifokbus acido- 
caldarius or Methanococcus jannaschu: this protein is highlysimilar to e!F-5A and could play a similar role in protein 
biosynthesis The signature developed for elF-3& is centered around the hypusine residue 
[0559] Consensus pattern EPT]-G-k-H-G-<-A-K [The fust K is modified k h>f.usme] 

ss 

EljParkMH Wolff EC Folk J E Eiofactors 4 95-104(1993) 

E2]S<.hmerJ Schv,<r!beigei H G Smit-M, Bride 7 kang H A Her^he^ J WB rVkl Cell Bio! 113105-3114 
(1991V 
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[0560] 18c (efnandt S-I00'lCaEP type calcium binding protein signature 

S-100 are small dimenc acidic calcium and zinc-binding proteins (1 j abundant in the brain. They have two different 
types of caiotum-btndinq -sites, a lew affinity one with : i special stiucturf* and a 'nomial' EF-hand fvp» =i high : ifftntly siie 
The vitamin-D dependent intestinal caictum-bindinq proteins HCaBP or calbtndin 9 Kd) also belong to this family of 
s proteins out it does not form dtmers In the past v'cars the scctuences of many now members of this family have been 
determined tfot te^ lews see [2 3 4]) in most ta^es. the function t f thpsp proteins is not yet Mi own .tlthouyh it is he 
coming dtafthat the ^ ir« irkolvud in ee II qtovi th and differentiation evil cycle tegul ition and metabolic control The^e 
proteins are: - 

Calcvclm (Prolactin receptor associated protein (PRA): clatropin; 2a9: 5B1Q; S100A6) - Calpactin I light chain (p10: 
10 (11 42c S100A10) - ufilgpmultn A (cystic fibiosis antigen tCFAgl MIF reLried ptolein 3 iMRP- 8} p8 blOOARt - 
Olgranulir. B (MIF related protein 14 (MRP-14) p14 S100A9) -C i!gr*nulin <~ - Calgizjrarin .MOOCi - Pla^nta! 
calcium-binding protein (CAPL) (18a2: peL98: 42a: p9K: MTS1: metastatic S1G0A41 - Protein S-10GD (S1G0A51 - 
Prttein S-lOi't: tSlOOAs-0 Piotein S-100!. iCANW M00A2i Placental piotein S-100P (S100£m Pstnaiin 
(S100A7). - Chemotactic cytokine CP-10 [5]. - Protein MRP-128 [6]. - Tnchohyalm [7]. This is a large intermediate 
»5 filament-associated protein that associates wrth keratfn intermediate filaments (KIR: it contains a S- 100 type domain 
in its M-tenmnal extremity, a number ot these proteins are known to bind calcium white others are not (plOfor example) 
Out (■:!■'■ h.jnd detecting p.jttetn will t .til to piokthoiP ptoteins which h.jve loittheit calcium-binding ptocerties & pattern 
u\as developed which unambiguously picks up proteins oelonging to this family This pattern spans the tegion of the 
EF-hand high affinity site but makes no assumptions on the talcinm binding pioperttes of this site 
20 [0561] Consensus patten [Ll\ MFYW]{2)-*(2HLh]-D-x( ^V[DN]->. 3)-[DNSG]-[FY3-x- [Eb]-[FYVO]->(24LiyMFb]- 
[LIVMF] 

[ 1] Baudier J. (In) Calcium and Galcium Binding proteins. Gerday C. Boffis L. GilierR.. Eds., pp102-113. bpnnger 
Verlag. Berlin. (1988). 

25 [2] Moncnef N.D., Kretsinger R.H.. Goodman M. J, Moi. Evol. 30:522-562(1990). 

[3] Kligman D.. Hiit D.C Trends Biochem Sci. 13:437-443(1988). 

f 4] Sc haefer B v\' Wicki R Enoelkam}. D Mattei M -G Hermann C v\ Genomic*, /5 638-e43i 1995} 
[5] Lackmann M.. Cornish C.J.. Simpson R.J.. Monte R.L. Geezy C.l. J. Biol. Chem. 267:7499-7504(1992). 
[ 6j Nakano T Graf T Oncogene " 527-534 ( 19321 
30 [ , ] Lee S -O Kim I -G Maiekov L N O Katie b J Party U AD Steine rt P U I Biol Chem 1 2 164-1^1 ,<S 

(1993). 

EF-hand calci urn-binding domain 

Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain 
35 knovi n t<s thu EF-hano [1 iu Thi«. typt of domain constats of a tv\eke re^idut; loop flanked on both side bv a twelve 
residue alpha-helical domain. In an £F-hand loop the calcium ion is coordinated in a pentagonal bipyramidal conftgu- 
talion Tfn* -sin iesidiu : 'S involvt-d in ttu : ' binding ar^ in ( options 1 o 5 v : ind 12 thtst: r-isidues ar-i denoted t> < 
V Z -Y -> ^no -Z The in^nant Glu ot Asp ^t position !2 provides t^vo oxygens foi liganoinc? Ca (fcioentat^ liq3nd' 
Listeo below ate the pioteins ^nich are knov.n to contain EF-hand regions For each t>oe of piotetn the total mimbet 
40 of EF-hand regions kno^n oi sup|.Ob<;dtoe<tstis indicated beK^en paienth<;bis This nunibti does not include tegions 
which clearly have lost their calcium-btndtnq properties, or the atypical low-affinity site (which spans thirteen residues) 
found in the S-100/ 
ICaBP family of proteins jo]. 

45 - -i.^qiionn and Rentlla luetfurm binding protein (LBP) iCa-^t 
Alpha actintn {Ca-z). - Calbtndin (Ca~4). 

Calcmeunn B subunit (protein phosphatase 2B regulatory subunit) (Ca~4;. 

Calcium-bindtng protein from Streptomvces erythraeus (Ca-3?V 

Calcium-binding protein from Schistosoma mansom (03=2^ 
so - ualcium-bmdmg proteins TCBP-23 and TuBP-25 from Tetrahymena thermophtfa tCa=4v). - ualcium-dependent 

protein kinases (CDPK) from plants (Ca^4i. 

Calcium vector protein from amphoxius (Ga=2). 

Calcyphosm (thyroid protein p24) (Ca=4?). 

Calmodulin (Ca~4. except in veast where Ca=3) 
ss - Calpain smali and l?<ig<= chains tCa-2) - Calff tinin iCa-6) 

Calcyclin (prolactin receptoi associated protein) (Ca~2> 
- Caltractin (centnn) (Ca=2 or 4). 

*" e II Division Coritrol prot^'in 31 (gene uD»"31) tfom yeasl (Ca-2") 
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- Dfacyiglycerol kinase (EC 2.7.1.107) {DGKi (Ca=2). 

FAO-d<=pendent pjy.:teiol-3-ptK senate dehydtonenabe (EC ! ! ^9 5} fiom mammals. (Ca=1) - Fimbnn (pla^tm) 
(Ca-21. 

Flagellar calcium-binding pidetn 1 1f8' tiom Ttypanosoma cuci iCa~ f <v 21 
s - Guanylate c\clasc activating protein tGCAPi tCa~ot 

Itxsttol phospholipid ■■specific phospholipjse C i^ozym^s yjmma 1 and detta - 1 ((. .t-2} f1 0] ■ Intestinal calcium- 
binding protein (ICaBPs) (Ca-21. 

- M!F related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2). 
Myosin legulatory lip.ht chain'; (€a~1 1 ■ Ontomodulin (C a-.?) 

10 - Oiti : 'Ofu : 'Clin (tas-irnent membi : ine protein BM-40> (SPARu) and proteins fh : it c^Eitatri-i an 'o^ori-ictiri' domain 
(QRt matnx olycoptute-m SC11 (i^e thu uritry ^PDOC00S3S- 1 <Ca-1 i - Paralbumins alpha and t-H<i <Ca-2' 
Placental calcium-binding protein (18a'^i (nerve growth factor induced protein 42a) tp9k) (Ca=2), 
Recovenns (vii,inin, hippocalcin neutocalcin S-modnltn) (Ca~2 to 3i 
Peticubcalbin (CuM-) - S-100 piotein alpha and bet j iruuns (Ca<21 

»5 - Sarcoplasmic calcium-binding protein tSCPs) {Ca~z to 3;. 

Sea urchin proteins Spec 1 (Ca=4l. Spec 2 {Ca=4?S. Lps-1 {Ca=8). 

^enne/thteonine piotem phosphatase idg*. it.C 3 I 3 t6i frun Dtoiophila (Ca~2i ■ Sorcin V1P from hamster 
(Ca~2). - Spectrin alpha chain (Ca=2l, 

Squtdulin (optic lobe calnum-bindinp. protein} from squid tCa~4) 
so - TiOf.oninsO fiom skeletal muscle tu : i-4) fiom (.ardiae muscle (ua-3) fiom arthte pods and molluscs <Ca-2i 

Tnere has been a numhei of attempts \~ 8] to develop oatternb that pick-up EF-hand regions, but these studies u\ete 
made a tew years ago when not so many different families ot calcium-binding proteins were known. Therefore a new 
pattern <vas cWtHop^d which takt^ into account all published se-quenc> j s This patte-tn includes the comply EF-hand 
2S loop as vsell as the first restaur *hich follows the loop and v,hich seem to alwa>s be hydrophobic 

- Consensus pattern: 0-x-[DNS)-{iLVFYW}-[DENSTG3-[DNQGHRKHGPHLIVMCHDENQSTAGC]-x(2HOE}- 

[UVMPYWJ 

Note positions 1 CO 3 (Y> and 12 (-Z1 are the most conserved 
30 - Ntct,:> ^th i^sidue in at! bf— hand loop is in mosi casts a Gly but the nunibtii oi e>cepitons to ihis 'tule' has 
gradually in:reas>ed und thetefoie the pattern should in-line all the diffeient residues *ni:h na^e bewi sfnwn to 
exist in this position in functional Ca-bmding sites. 

Note: the pattern will, in some cases, miss one of the EF-hand regions in some proteins with multiple EF-hand 
domains, 

35 

1 1] Kawasaki H„ Kretsmger R H. Protein Prof. 2:305-490(1 095). [2] Kretsmger R.H. Cold Spring Harbor Symp. 
Quant. Biol. 52:499-510{1987V 

[ 3] Monon^f N L hr^tsmger R H Goodman M I Mol Evol 30 "i22-5^2(19O0i 
[ 4] Eslakayama e MoncitefND hretsingerRH J Wbl Evol 34 416-446(19021 
40 [ 5] Hecmann C W Hunzikw W Trends Bkx hem Sci 1G 98-1 03i 1991} 

[6j Kligman D.. Hilt D.C. Trends Biochem. Set. 13:437-443(1988). 
[ "'j Strvn-.nka N C J , James M N G Annu Pe,' Biochem 58 951-08(19891 
[ a] Haiech J.. Saliantm J. Biochimie 67:5SS-S60(18ii5), 

[9]Chauvau<o Begum P AubertJ-F Bh.ttKM Gca I. A Wotd'IM Bairoch A Biochem I 2ot> 26 1 "to 
4S (1990). 

[lU)BairochA Co\ j A FEBS Lett 2c 9 454-456(1090} 

[0562] 184 Fnolase signature 

Enolase t EC 4 2 1 11 1 is a glycolytic enzyme that catalyses tne deh\dration of2-pnospho-D~gl\ cerate to ohosphoe- 
50 nol^iuvate [ !] It is a dimeu^ on:vme that i^quitts magnesium ixth toi catalysis and stabilizing the dimti Enolase 
is prob.tbl\ found m all organisms that itvt.jboli.T? itig.jii. In vertettates theie aie three different tissue -specific iso 
zymes aiphu f lesent in most tissues betj in muse lei, and gamma f?und only in nervous, tissues Tau-ciystallin, ?ne 
of the major lens proteins in some fish reptiles and birds has been shown [2] to be evolutionary related to enolase 
Ah a '.ignatute pattern tor enolase the b, :i sl c> - >nservt;d teqion was. ".eluded it is lec : ited in th-i ^-terminal thud of the 
ss sequence.- 

[0563] ConsenbUb pattern [Lhy)(3}-K- A -N-Q4-G-[cT]-[LIVHST)-[D£j-[STA] 
flllebiodal StecB &\<?s><?t JM i Biol Chem 264 3t>8f-3o03( 1<^89i 
[2]Wistovv o Piattiguiskv J baence z15 \ 554-1 S56{ 1^8 71 
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[0564] i SC (F-actin_cap_Ai F-actin capping protein alpha subunit signatures 

The F-a<Jm capping piotein binds in i client! n-tfujep^ndent mannei k the fa^t giOAiny ends ot^ctm filament? (barbed 
f»nd't thereby blocking tht e>ch : mge of su bun its : it thtsti tnds Unlike geisolm and st-venn this piofein dots net sevir 
octin filaments The F-artin capping f totem is a heterodimei composeo of two unrelated suotinits alpha und fc eta The 
s alpha subunit is a orotcin of about 268 to 2SC amino acid residues v. nose sequence is sveii conserved in eukaryotic 
species m As signature patterns two highly conserved regions in the G-terminal section of the alpha subunit were 
selected. 

[0565] Consensus pattern V~H~[FY]t2V-E~D~G-N-V 
Consensus pattern: F-K--[AE|-L-R-R-x-L--P- 
10 [0566] [ 1] rc,op<*i I A Caldwell J E G.merm^it D J Tone'. M A Amattuda J F >"as<Mla J F Cell Molil Cyk skel- 
eton 18:204-214(1991 i. 
[0567] 188. F-box domain 

[0568] [1j B.ti C Sen P Hofmann K Ma 1. Goehl M Hatpet IV\ F-lleoge SJ Cell 199-3 163 >Y4 [2j 6>kowva D 
Ctaig YL Tyets M Elleoge SJ Harper IW Cell W 91 109-210 
is £0569} 187. F-protein 

Negative factor. (F Protein) or Nef. 

[0570] jtJAroldo F lanken P Strub M-P Hoh F BeniohouS Ben.jious P Dumas C Medline &&W4f " !he,ryit.jl 
structure of HIV-1 Nef protein bound to the Fyn kinase SH3 domain suggests a rote for this complex in altered T cell 
receptor signalling Structure 1997:5' 1361 -1372 
so [0571] Nef \ tckin tKee'lerat^s virulent progression of AIDS t> its inleiai.tion with cellular pfoteins involved in signal 
transaction am host :ell activation Nef has, been snown tn fcind specifically to a subset of the Stc kmuse family 
[0572] Number of members: 1013 
[0573] 188. (FAD_binding_2) 

Furrtfiati; reductase succinate dthyotogenase FAD-bmding site in bacteria two distinct me mbrane -bound engine 
25 complexes are responsible for the mterconversion of fumarate and succinate (EC 1 .3.99.1 t: fumarate reductase (Frd) 
is used in anaerobic yiowth and succinate dehydiogenase tSdh) is used in aeiobio growth Both oomple^es oonsist 
of tv,o main components a membiane-ednnsic t omponsr-nt oc mposed of a FAD-btnding flavopiotem and an iron-sulf'tt 
protein: and an hydrophobic component composed of a membrane anchor protein and/or a cytochrome 8. 
[0574] In eukaOjOtes mitochondrial succinate dehydrogenase i ubiquinone J^EC 1 ^ S n is an enzyme composed of 
30 two suhuntts. a FAL) fiavoprotein and and iron-sulfur protein. 

[0575] The fetoprotein subunit is a piotein of about 60 to ~0 hd to which FAD is cwulently b^und to a nistnine 
residue which is located in the N -terminal section of the protein (1) The sequence around that histidine is well conserved 
in Frd and Sdh fiom vanous bactenal and cttkai yotk secies [2] and <an b*> us^d a sicjn^tuie pattern 
[0576] Consensus pattemP-jST j-H-jST f <(2)-A->> -G-G jH is the FAD binding site] Sequencer known to belong to this 
35 class detected by the pattern ALL. 

[ I] Blaut M Whittaktii K Valdovinos A ^oktcllBA GunsalusRP ^tct hint G J Biol r hern 264 1 3S99- k<604 
i'198Q\ 

[,j Birch-Wachm M A Farnswortn L Ackreil B A Cochran B Jackson S Bindoff L A Attken A Diamond A 
40 (.-. TuinbullDM J Biol Ch<=m 2o7 115£"3- M558I, 1992) 

[0577] 189 Fatty acio Tesatutases signatures ^FA_desaturase' 

Fatty acid desaturases (EC 1.14,99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
ot fatty auds There seems to be two distinct families of tatt\ acid desaturjsc-s which do not seem to be evolutic naty 

45 ie!;ated Family 1 is compose d of -Ste;<iov!-CoAd.*satiit;ase ( SCDHE<~ 1 14«Q^l[1] PCD is a f-ey mqulatory enzynje 
of unsaturated fatty acid biosynthesis SOD intioduces a ci e double bono at the dt!ta( Q i postticn ot fatty acyl-CoA s 
such as palmitoleoyl- and oleoyl-CoA SCO is a membrane-bound enzyme that is thought to function as a part of a 
multienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a stqnature pattern for this family a 
conserved region in tne C -terminal part of these enzymes was selected this region is rich in histidine restoues and in 

so afomatic tcbidtiHS Family 2 ib composed of - Plants s.tHaro\ l-a^. vl-caru^r-ptot^in d^baturase (EC 1 14 £"0)323 these 
enzyme* catalv.Te the introduction of .j doutle bono .jt the delta(9) position <.f stetat yl-&CP to produce oleoyi-ACF 
This enzyme is responsible for the version ot satuiated fatty acids to unsaturated f<jtt\ a:ids in the synthesis of 
vegetable oils. - Cyanobactena desA [3] an enzyme that can introduce a second os double bond at the dettat 1 2 ) 
posiiion of fatty aud bound to mtmbrantis gly>'efo!ipids DesA is involved in chilling tol^f : mce fhe ptiasf 1 tpmsiticn 

ss tempt^atuEt; of lipids of cellular ni^mbran^s b> j inq dep^nd^nton the deame of unsatttfation of f;ath, acids of thu mtni- 
btane itpios As a signature pattern for this family a conset\ed region in the C-tetrninai part of these enzymes was 
selected. 

[0578] Consensus ( an>m i3-E-<-[Fi']-H-N-[FY]-H-H-x-F-P-*-D-Y- 
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Consent pattern [ST]~[SA]-MoHQRHLi]~n5 o>-D-vv(>{L!VMFYW3-[LIVM3-(PE]- 

[ 1] Kaestner K.H., Ntarnbi J.M., Kelly T.J. Jr., Lane M.D. J. Biol. Chem. 264:14755-14761(1989). 
[qShanMin i Som^n ille C R Fioi Nat! Acad Sn USA 88 2510-2^14(1991 ) 
s [3]WadaH Combos! K iu rata N Nat. ire- 347 200 203(1 OOOt 

[0579] 1 90 Ftuctes.>-1-B-bisphosphata!^ active titu i FbPasti) 

Fructose-l^-bispnosphatase (EC 5 1 5 11) tFBPaseiJi] a regulator* ercyme in giaconeogenesis catalyzes the hy- 
drolysis of fructose 1,6 hii.phosphate to fructose (r- phosphate it is involved trt man> dtffeient metabolic pathway*, and 

to fcund in most organisms, bsdoheptulot,^-! 7-bisphosphatei,< : ' (EC 11 IT] iSBPa^e) [2j is, an ercvrnt; fcund plant 
chiofopiast an J in photosvnthf-tif bacteria th it catalyzes thf- hydiolvt.is oj s-'dohtptulosii 1 7-hit>phusphah; jo sedo- 
hfptulo^e 7-ptK striate i step tn th*» Cahin's reductive pentose prn.Sj.hate ^ycle It is. functionally and btiuiJuially 
related to FBPase In mammalian FBPase a Ivsme residue ha* been shown to he involved in the catalytic mechanism 
[3] The tegion ..itound this le^ioue is highlv conserved and tan be used us a signatute pattern fin FSPase und SBPase 

»5 it must be noted that, in some bacterial FBPase sequences, the active site lysine is replaced by an argmine 
Consensus pattern: [AG]-[RK]-L-x(1,2HLlVHFY}-E-x(2)-P-[LIVMHGSA] [K/R is the active site residue]- 

[ 1 j Eenkov'ic S J DelUame M M Ad* Enamel 53 4C-82t 19821 

[ ?3 Raines C A Lloyd JO Willing ham H Wi Potts S Dyei TA Eui I Btoohem 2^5 iuo> 105^(19^2 ) 
20 PjKtH Thop^CM --ialoriBA Lipscomb Vv N M=irus F J Mol Biol 2lz 513-5o9( !08Q| 

[0580] ID! FGGY family of caroohydtate kinases signatures " 

It has bf^n shown [ !] th=rt foui different typf of caibohydrate kina^eb seem to be fvolutionaiy t elated Thtst enzymes 
an* - L-tucolokinast; t EC 2 7 1 51) (gun- 1 fixkj - Giuconohnase (EC 2 7 1 125 <gi*n#- gntM - Glyo^uhna^ (EC 

2s 2.7. 1. SOngene gipk;. - Xylulakfnase (EC 2.7.1.17) (gene xyiSL - L-xy!u!ose kinase (EC 2.7.1. 53ngene lyxKt.These 
enzymes are pioteins of from 480 to £'>Q amino a^id residues As consensus patterns foi this family of kinases two 
tonseP'ed le'jjionswere seletled one m tht- (.entral section the othet in (he C -terminal set-lion 
[0581] Consensus pattern [MFVGS]-v-[PST]-\(2)-h-[LiVMFVVV]-A-W-[LIVMF]-A-[DEEv!OTKP]- [ENQFl]- 
Consensus pattern: [GSA3-x-[LiVMFYW3-x-G4LiVM^x(7,8HHDENQHLIVMF^x^2HASHSTAIVMHLiV•MFYHDEQ]- 

30 [0582] [IjRer^i A L'tiiitsthtii J SjwMH Jr Rtwt J Mo! Mtaobtol o 1081-1089.1991} 
[0583] 192 FKBP-type peptioyl-prolyl cis-ttans isomerose signatuie^ 'profile (FKBP) 

FKBP [1,2,3] is the major high-affinity binding protein, invertebrates, for the immunosuppressive drug FK506. It exhibits 
ppptidyl-piolyl us-tians isomeia^ actiutv (EO 5 2 18) tPPIa^e or rotamase^ PPhst it, an ^n^ym^ that ac^elfiates 
protein folding by catalyzing the cis-trans isomenzation of proline imidic peptide bonds in oligopeptides [4].At least 

35 thri*u difteff nt forms of FKBP an* Kno'vn in mammalian splits - FF EP-12 * hich it, c^osok and inhibited bv both 
FK506 and rapamyctn, - FKBP-13, which is membrane associated and inhibited bv both FK506 and rapamycin, - FKBP- 
25 which is pititeit-ntpiiiy inhibited by mpamyoiri Thw fonito of FKBF ate t-vokitic nary filiated and t,hovv f»<ttint,tvti 
simi!anties[5. 6.7] with the following proteins: - Fungal FKBP. - Mammalian hsp binding immunophilm iHBIHalso called 
p59). HB! is a protein which binds to hsp90 and contains two FKBP-like domains in its N- terminal section - the first of 

40 which seems to be functional. - The C-termina! part of the cell-surface protein mip from Legionella; a protein associated 
with macrophage infection by an unknown mechanism. - Escherichia coil sivD [8j. a protein with a N-termmal FKBP 
domain followed by an histtdme-nch metal-binding domain. - Escherichia coif fkpA. - Escherichia coll fklB (FKBP^2i. 
■ tscheirhis ooli slpA ■ Bacterial trigger factor t I'lgi ■ Streptomyres hycjioscopus and chi>somsllus F-K&'Mr- binding 
protein ■ Chlamvdia tiattx matis 2' Kd membiane piotein ■ Neis^eru meningitidis stiam C114 PFwse ■ Fiotahle 

45 pPt?<st!S from Ha^mophiliii, influtn^ae (HI0754;, M<.-Ihanucuc(..ijs jannjschu (MJ0278 jnd MJ0825). Pseudonionat, 
fluorebcens and P e eudomrnase aeruginosa Two signature patterns for these proteins were developed One is based 
on a consei^ed region in the N terminus ot FKBP. the other is located tn the central section The profile tor FK8P spans 
the complete domain, 

[0584] Consensus pattern [LIVMCj-x-[Y'F3-x-[GVL]-x(1 .2)-ELFT)-x(2}-G->!(3}-[DEj-j;STAEQK]-[STAN]- 
50 [0585] Consensus pattern: [LIVMFY3-x<2)-[GA]-x(3,4)-[LiVMF]-x(2}-[LiVMFHK]-x{2)-G- x(4HLIVMF]-x{3HPS- 
GAQ3-x(2HAGHFY]-G~ 

[ 13 Tropschug M.. Wachter E.. Mayer S„ Schoenbrunner E.R., Schmid F.X. Nature 346:674-877(1990). 
[23 Stein R.L. Curr. Biol. 1:234-236(19911. 
£5 [ 3] Siekierka J. J.. VViderrecht G.. Greuiich H. ( Boulton O., Hung S.H.Y., Cryan J.. Hodges P.J., Siga! N.H, J, Biol. 

Chem 165:21011-21015(1990). 

[ 43 Fischer G. . Schmid F.X. Biochemistry 29:2205-2212(1990). 

[ 5] Trandinh O C Pao <j M . Saier M.H Jr. FASEB J 6:3410-3420(1992) 
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[6jGalatA Eur J Biochem 216 689~707 t 1993) 

[ 7] Hacker J . Fischer & Mo! Mieiobiol 10 44545&I 1993) 

[8] WLielfing C Lomardero J Plueckthun A J Biol Chem 269 2895-2901 (1994 1 

s [05S6] 193 MAPEG family (aka FLAP/GST2/LTC4S family signature) 
[0587] The following mammalian proteins are evolutionary related ( 1 ]. 

Leukotriene C4 synthase (EC 2.5. 1 .37 j (gene LTC4S), an enzyme that catalyzes the production of LTC4 from LTA4. 
- Microsomal glutathione S-transfera?e ll t£.C 2 5 1 18) (GST-ll) igene GSl'21 an enrryme that can also produce? 
10 LTC4 fron LTA4. 

5-iipoxygenase aoiivjting protein igene FLAP?, a protean that seems to be required for the activation of 5-lipoxy- 
genase. 

[0588] These ate proteins of 150 to f 60 residues that contain three transmembtane segments Asa signatuie pattern, 
'5 a conserved region between the first and second transmembrane domains was selected. 
[0589] Consensus pattern- G-v(3)-f : -E-R-V-[F-Y'3-x-A-[NQ]-v-N-C 

[0590] j t) Jakohsson P. ■ J . Mancmi J A.. Ford -Hutchinson A W J Biol Chem. 271 22203-222 1 0( t%6) 
[0591] 194 FMN-dependent alpha-hydroxy acid dehydrogenases actwe srte tFMN_dhj 

A number of oytdoreductases that act on alpha-hydro>-y acids and which are FfVIN- containing fiavoproteins have been 
so shown [1 2,3] 1o bo structurally related' these enzymes are' - Lactate- dehydrogenase (EC 112 3) which consist of 
a dehydrogenase domain and a heme-btndmg domain called cytochrome b2 and which catalyzes the conversion of 
lactate into pyruvate -Giycolate oxidase (EC 1 1 3 151 t(S)-2-hydro.<y-acid oxidase 1 a peroxisomal enzyme that cat- 
alyzes the conveision of giycolate and oxygen to giyoyyiate and hydrogen peioxide - Long chain alpha-rsydroyy acid 
oxidase from rat (EC 113 15) a peroxisomal enzyme - Lactate 2-monoo>" J genase (EC 11312 4' (lactate oxidase) 
25 from Mycobacterium smegmatis. which catalyzes the conversion of lactate and oxygen to acetate, carbon dioxide and 
water. ■ (S)-mandeiste dehydrogenase from Pseudomona? putida (gene mdlB). which catalyzes the reduction of tS)- 
mandelate to benzoyl forms te The first step in the reaction mechanism of these enzymes is the abstraction of the 
proton from the aipha-cjrbon of the substrjte producing a carbanron which can subsequently attsch to the N5 atom 
of FMN A conserved histidine has been shown [4] to be inx/oKed in the removal of the proton The region around this 
30 active site residue is highly conserved and contains an arginine residue which is involved in substrate binding 
[0592] Consensus pattern S-N-H-G-[AG]-R-Q [H is the active site residue] [R is a substtate-bmding residuej- 

[ 1] Gi&gel D A Williams C H Ji , Massey V J Biol Chem 265 662C-6632( 1 990 > 
[ 2j Tsou A V . Ransom S C. Gertt J A . Buechter D.D., Babbitt PC, Kenyon G I. Biochemistry 29 
35 (1990). 

[ 3) t.e K H D.. I.ederer F J Biol Chem. 266 2087 ;"■ 20880s t<J91) 
[ 4] Lindqvisl Y Branden C -I J Biol Chern 264 3624-3828(1989) 

[0593] 195 Flavin-binding monooxygenase-like tFMO-likej 
■to [0594] This family includes FMO proteins, cyclohevanone monooyygenase 
[0595] 198. (FPGS'i 

Folyipolyglutamate synthase signatures (aka Murjigase) 

[0596] Folyipolyglutamate synthase f EC 6 3 2 1 7) (FPGS) [1 j is the enzyme of folate metabolism that cataiyrres ATP- 
dependent addition of glutamate moieties to tetrahydrofoiate 
■*s [0597] its sequence is moderately conserved between prolan, otes tgene folCt and eukjryoies We. developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a icle in the 
cataiytical activity and/or in substrate binding 

[0598] Consensus pattern [LIVtv1FV]-.<-j;L!VM)-[STAG3-G-T-[k!K3-G-K-x-iST]-xi7V [L!VM](2)-xi 3!-[GSK] Sequences 
known to belong to this class detected by the pattern ALL 
so [0599] Consensus pattern[LIVMFY]{2)-E-x-G-[LIVMHGA]-G-x(2)-D-x-[GST]-x-[L!VM](2) Sequences known to be- 
long to this class detected by the pattern ALL. 

[0600] [ 1) Shane B . Ganow T . Brenner A . Chen L . Chor i J Hsu J C Stovei P Adv Exp Med Biol 336 629-634 
(1993). 

[0601] 1 97 FYVE zinc finger 
ss [0602] The FYVE zinc finger is named after four proteins that it has been found rn Fabl, YOTB'ZK632 12 Vac1 
and EEA1 The FYVE finger has been shown to bind two 2n++ ions [1] The FYVE finger has eight potential zinc 
coordinating cysteine positions. Many members of this family also include two histidmes m a motif R+HHC+XCG, where 
+ represent a charged residue and X any residue Members were included which do not conserve these histidine 
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residues but are clearly related. 

[0603] ft] btenmaik H Aasland R, Ton BH, OAmgo A J Bioi Chem 1<^9G2 _ 1 24048-240^4 [2] G=tulli*>r JM, Si- 
monst-riA CArngc A Bremr^s B Stenmark H Aasland R Nature 19<-j9 394 43^-4^3 
[0604] 198. F_actin_cap_B 
s F-actm capping protein beta subumt signature 

[0605] The F -^din capping piotem bind? in .j c.jioium- independent mannet to the t.tst giowing end? of actrn filament'; 

1 barbed f-nd't th^rtihy blocking the <=■*<■ hange of ^ubirnits at ih<^ ends Unlike gelsolin and se^nn this protein does 
not sever actrn filaments. The F-actm capping protein is a heterodtmer composed ottwo unrelated subunits: alpha and 
beta. 

to [0606] The beta suburb is d protein uf about 280 amino acid residues whose sequence is A-tll conserved m t;ukaivotic 
sp^cr^s [1] -is a sicmatur- 1 pattern a cunswurf huva peptide in thu N -tormina! section ofih^btta suhurut'Aas selected 
[0607] Consensus pattern C-D-Y-N-R-D toequetues Known to belong k this. _ia^ deeded by the pattern ALL 
[0608] [1j Am.jttudrt H : Cannon l f : "latcbellK Hug C Coopei J A Nature 344-36r-3?4MP90i 
[0609] 109 Isopeniuilin N syntnetose signatuies (Fe__Asc_pytdoreo) 

»5 Isopenieiihn N synthetase (IPNS) {1 .2] is a key enzyme in the biosynthesis of penicillin and cephalosporin. In the pres- 
ence of oxygen, it removes iron and ascorbate, four hydrogen atoms from L-jalpha-aminoadrpylJ-L-cysteinyi-d-valine 
to form the azetidmone and thiazoiidine rings of tsopentclllin. IPNS is an enzyme of about 330 amino-actd residues 
Two cysteines are conserved in fungal and bacterial IPNS sequences: these may be involved in iron-binding and/or 
substrate-binding. Cephalosporin arremontutrt DAOCS/DACS [3] is a Afunctional enzyme involved in cephalosporin 

so biosynthesis The DAOCS domain, which is structurally related to IPNS, catalyzes the step from penicillin U to deae- 
etoxy-cephalosponn C - used as a substrate by DACS to form deacetyfcephafosponn C Streptomycesclavultgerus 
possesses a monofunctional DAOCS enzyme fgene cef£) [4j also related to IPNS Two signature patterns for these 
enzymes were derived, centered around the conserved cysteine residues. 
[0610] Consensus pattern: [RK]-y-[STA]-y(2}-S-x-C-Y-[SL3- 

25 Consensus pattern: [LI\^P)-x-C-G-ESTA3-x(2HSTAG^(2>>T-x-IDNG>- 

[1] Martin J.F. Trends Biotechnol. 5:306-308(1987). 

[ 2] Chen G,. Shiftman D., Mevarech M., Aharonowitz Y. Trends Biotechnol, 8:105-111(1990). 
[ 3] Samson S.M., Doteiaf J.E., Siisz M.L, Becker G.W., van Frank R.M., Veal I.E., Yeh W.K., Miller J.R., Queener 
30 s.W, Ingoiia "CD. Bio/Technology 5:1207-1214(1987}. 

[4] Kovacevic S., Weige! B.J., Tobin M.B., ingoiia T.D., Miller J. R. J, Bacterioi. 171:754-780(1989). 

[0611] 200. Fibrillarin signature 

Fibrillarin [1] is a component of a nucleolar small nuclear ribonucleoprotein(SnRNP) particle thought to participate in 
35 the first step of the processing of pre-rRNA In mammals fibrillarin is associated with the U3, US and U1 3small nuclear 
RNAs [2). Fibrillarin is an extremely well conserved protein of about 320 amino acid residues. Structurally it consists 
of three different domains: - An N-terminal domain of about 80 amino acids which is very rich in glycine and contains 
a number of dimethylated arginine residues (DMA). - A central domain of about 90 residues which resembles that of 
RNA-bindmg proteins and contains an octamenc sequence similar to the RNP-2 consensus found in such proteins - 
40 A C-terminal alpha-helical domain. A protein evolutionary related to fibrillarin has been found [3] in atchaebaeteria 
such as tVtethanococcus vannieln or voltae This protein (geneilpA) is involved in pre-rRNA processing It lacks the 
Gly/Arg-nch N-terminal domain. As a signature pattern, a region was selected that starts with and encompasestheRNP- 

2 like octapeptide sequence. 

[061 2] Consensus pattern: [GS T j-[LIV'MAP}-V- Y-A-flV [■£ -| F Y KSA;h*-R-*(2}"R"[DF. }■■ 

[ 1j Alls J.P., Blobe! G. Proc. Natl. Acad. Sci. U.S.A. 88:931-935(1991). 

[2j Bandziulis R.J , Swanson U.S . DreyfussG Genes Dev. 3 431 -437(1989). 

[ 3] Agha-Amiri K. J. Bacterioi. 176:2124-2127(1994). 

50 [0613] 201. Fiiamin/ABP280 repeat 

[0614] j1] Pucmi R Renner C. Herberhold C : Noegel A A. Holak TA. Nat Struct Biol 1 997:4:223-230 
[0615] 202. Fucosyl transferase 

[061 6] This family of Fucosyitransferases are the enzymes transferring fucose from GDP-Fucose to GlcNAc in an 
alpha 1 3 linkage 

ss [0617] [1] Breton C, Oriol R, irnberty A, Glycobiology 1998;8:87-94 

[0618] 203. 2Fe-2S ferredoxins. iron-sulfur binding region signature (fer2A) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
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clusters) and according to sequence similarities One of these subgroups are the 2Fe~2S ferredoxins, which are pro- 
teins or domains of around one hundred ammo acid residues that bind a single 2Fe-2S iron-sulfur cluster. The proteins 
that are known [2] to belong to this family are listed below. - Ferredoxin from photosynthetic organisms; namely plants 
and aigae where it is located in the chloroplast or cyanelle. and cyanobacteria. - Ferredoxin from archaebacteria of 

s the Haiobacterium genus - Ferredoxin IV (gene pftA) and V fgene fdxD) from Rhodobacter capsuiatus. -- Ferredoxin 
in the toluene degradation operon (gene xylT) and naphthalene degradation operon (gene nahT) of Pseudomcnas 
putida, - Hypothetical Escherichia co!i protein yfa£. - The N-terrninal domain of the Afunctional ferredcxm/ferredoxin 
reductase electron transfer component of the benzoate 1.2-dic*ygenase complex (gene benC) from Acmetcbacter 
calcoaceticus. the toluene 4-monooxygenase complex (gene tmol'-"}, the toluate 1 ,2-dtoxygenase system (gene xylZ}. 

to and the xylene monooxygenase system (gene xylAi from Pseudornonas - The r-J-terrntnal domain of phenol hydrox- 
ylase protein p5 (gene drnpP) from Pseudornonas Putida - The N-terrninal domain of methane rnonooxygenase com- 
ponent C (gene mmoC} from Methylococcus capsuiatus . - The C-terminal domain of the vanillate degradation pathway 
protein vanB in a Pseudornonas species ■ The N-terminal domain of bacterial fumarate reductase iron-sulfur protein 
(gene frdB}. - The N-termtnal domain of CDP-6-deoxy-3.4-g!ucoseen reductase (gene ascD) from Yersinia pseudotu- 

»5 bercuiosis. - The centra! domain of eukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein. - The N~ 
terminai domain of eukaryotic xanthine dehydrogenase. - The N-terminal domain of eukaryotic aldehyde oxidase. In 
the iFe-28 ferredoxins. four cysteine residues, bind the iron-sulfur cluster Three of these cysteines are clustered 
together in the same region of the protein Our signature pattern spans that iron-suifur binding region. 
[0619] Consensus pattern C-{C}-{C}-[GAj-fC}"C-{GAST]-{CPDEKRHF AY}-C [The three C's are 2Fe-?S ligandsj- 

20 [1] Meyer J. Trends Ecoi Evol. 3222-226(1988) [ 2] Haiayama S , Polissi A , Rekik M. FEBS Lett 285-85-88(1991 ) 
[0620] Adrenodoxin family, iron-sulfur binding region signature (fer2B) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions Ferredoxins can be divided into several subgroups depending upon the physiological natuie of the iron sulfur 
cluster(sj and according to sequence similarities One family of ferredoxins groups together the following proteins that 

25 all bind a single 2Fe-2S iron-sulfur cluster; -Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate mitochondrial protein 
which transfers electrons from adrenodoxin reductase to cytochrome P450scc, which is involved in cholesterol side 
chain cleavage - Putidaredoxin iPTX). a Pseudornonas putida protein which transfers, electrons from putida redoxin 
reductase to cytochrome P450-carn, which is involved m the oxidation of camphof. - Terpredoxin [2j. a Pseudomonas 
protein which transfers electrons from terpredoxm reductase to cytochrome 

30 P45Q-terp : which is involved in the oxidation of alpha-terpineol. - Rhodocoxin [3], a Rhodococcus protein which transfers 
electrons from rhodocoxin reductase to cytochrome CYP116 (thcB). which is involved tn the degradation of thiocar- 
bamate herbicides. - Escherichia coli ferredoxin (gene fdx) [4] whose exact function is not yet known. - Rhodobacter 
capsuiatus ferredoxin VI [5), which may transfer electrons to a yet uncharacterized oxygenase - Gaulobacter crescen- 
tus, ferredoxin (gene fdxB) (6].ln these proteins, four cysteine residues, bind the iron-sulfur cluster Three of these 

35 cysteines are clustered together m the same region of the protein. Our signature pattern spans that iron-sulfur binding 
region. 

[0621] Consensus pattern: C-xf2)-[STAQ3->-[STAMV]-C-[STA]-T-C-[HRj [The three C's are 2Fe-2S ligandsj- 

[ i j Meyer J Trends Ecoi. Evol. 3 222-226(1988} 
40 [2] Peterson J. A., Lu J.-Y, Geisselsoder J.. Graham-Lorence S., Carmona C, Witney F, Lorence M.C. J. Biol. 

Che in 267:141 93-1 420 3f 1 992). 

[ 3] Nagy I.. Schoofs G , Compernolle F, Proost P., Vanderleyden J . De Mot P J. Bacteriol. 177 676-687(1995). 
[ 4] TaD.T., Vickery 1. E;.. J. OStol Chem. 267 111 20--111 25(1992). 

[ 5] Naud I., Virtcon M., Garin J, Gaillard J., Forest E, Jouanneau Y. Eur. J. Biochem. 222:933-939(1994). 
4S [ 6] Amemiya K EMBL/Genbank: X51807. 

[0622] 204 4f-'e 4c3 feiredovms non-sulfur binding legion signatute ffei4) 

Ftireduxins [\] ate j gioup of nori-sulfur pi ofetns which mediate election franster in a wide \arie ty of m^tatxltc reac- 
tions Ferredoxins can he divided into se\eral subgroups depending upon the physiological nature of the iron-sulfur 

■io clust^rtbt One offline subgioups aie the 4Ft--4b f^nedo^ins which are found in bacteria and which are thus oft^n 
referreo as haotenal-tvpe' ferredoxins I he structure of these proteins [2] consists ot the duplication cf a domain of 
fwentv six amino acid residues each of these domains j^ntains f^ur cysteine resiou^s that bind to a 4Fe-43 centet 
A number of proteins have been found [2] that include one or more 4Fe-43binding domains similar to those of bacterial- 
type ferredoxins. These proteins are listed below ( references are only provided tor recently determined sequences), - 

55 [0623] Theiron-sulfui pi otuins of th-» succinate dehydrogenase andtht fiiniarat.M.Kfuct isecumplux^iEC 1 3 99 11 
Tnese en^me completes which are components otthe tricatbo^lic acid cycle each contain three subunits a tlavo- 
protein, an iron-sulfur protein, and a b-type cytochrome. The iron- suifur proteins contain three different iron-sulTur 
centers a 2Fe-2S : i SF^-^S and : i 4F< 1 -4S - Escherichia toll ana^Kbic glvcvicl-o-phosphate dehydrogenase (EG 
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t 1 y0 5) This enzyme is composed of thiee subunits A B and C The C subunit seems to he an iron-sultur piotein 
with two ferredoxin-like domains in the N-termmal part of the protein, - Escherichia coli anaerobic dimethyl sulfoxide 
it-ducta&i 1 ' The B sut unit of this enzyme !,gene dmsB} is : m iron-sulfui piofem with tout 4Fe-4S fi : 'rredo> tri-iilo* dc mams 
-Es:henchta toll formate hydr^gwihase Tw> of the suottnits of this oligomers c?mpie>k (genes hvcB and hycF' seem 

s to be iron sulfur proteins that each contain two4Fo 4$ fcrredo/un-likc domains Methanobacteriumformictcum formate 
dehydrogenase tEiC 12 12} 1'his enzyme is used ty the archrtetactetia to grew, on fonn.jte The bets chain of this 
dimern. -"nzune probably binds tv,o 4 Ft;-4<i tents; ii= - Escherichia coil fuim lit- dehydrogenases N and CuEC 1 2 1 21 
The beta chain of these two enzymes t genes fdnH and fdoH) are iron-sulfur proteins with four 4Fe~4S ferredoxm-like 
domains. - Desulfovtbno periplasms [Fe] hydrogenase (EG i . IS.99,1 ). The targe chain of this di me nc enzyme binds 

to irut'i 1 ' 4Fe-4S i-t-nOrs twe cf which : ne located in tht fenedo<in-like N-terminal legion of the \ rot^m - Me thanobac- 
teriurnthermoaiitrophieum methyl viologen-reducing hvdrogenase subunit mvhB. which contains six tandemly repeated 
f^riedo^n-like dc mains and o*hkh pi oL ably binds twdvfe 4F* 1 ~iS centers - baimoneiia typhimunum anaerobic sulfite 
reductase iE-!C 18 1 i [4] 'fvso of the subunih tf this enzyme (gene* .tsi& and asiO seem fo both bind two 4f-'e-4S 
centeis - A Feiredoytn-lil-n r. totem (gene fix O ftom the nrtiogen-fu<jtn'n genes lorus of vanous Fhcobium species 

*s and one from the Nif-regton of Azotobacter species. - The 9 Kd polypeptide of chloroplast photosystem ! [5] (gene 
psaC). This protein contains two low potential 4Fe-4S centers, referred as the A and B centers. - The chloroplast frxB 
prof em vUiich is pi edit ted to cany t,\o 4f-'e AS centets ■ An fenedcon from a r. nmitive eukaryote the enters amoeba 
Entamobea nistohjttca ~ Escherichia coll h> ootheticai ptotein yyW a protein with a N-terminal region belonging to the 
radical activating en?> mes tamih (see -'PDOC0U8?4>) and t*o potential 4Fe 4S centers The pattern of cysteine ies- 

so tdnes in Hu : ' uon-^ulfur region is suffiuenl todtittict ttiis el : iss of 4Fe-4S binding proteins 

[0624] Consensus pattern C-M2)-C-<(2)-C-^3)-C-iPEG] ]The four C's aie 4Fe-4S ligands]- 

[IjMeyei 1 Tiends, Ecol Evol 3 222-^201 19S8} 
[2]OtakaE OoiT.i Mo! Evui 2fi2^""-2fi7(13b~j 
25 [ 3] Bemert H. FASEB J. 4:2483-2492(1 990). 

1 4] Huang C J Barrett E t. I P-a^tend 17,/ 1 544- 1Scv( 19*31 1 
[5]KnaffDB Trends Bk ohem t>ci 13 460-461(1988} 

[0625] 20C NifH'frxC faintly signatures (fer4_NifH) 

30 riittogenase (EC I 19 6 1} [Ij is Uu : ' erevrne sysltim it sponsible; fci fciological nitiogen fixation h^iogena^ is an 
oligomem romplex which consists <'f fao components :omponent 1 which contains the active site for the reduction 
of nitrogen to ammonia and components (also called the iron protemi.Component 2 is a homodimer of a protein (gene 
nitH) which binds a single 4Fe-4S iron sulfur cluster [2]. In the nitrogen fixation process nitH is first reduced by a protein 
such as. ferredoxin; the reduced protein then transfers electrons to component 1 with the concomitant consumption of 

35 ATP A number of proteins are known to be evolutionary related to niff-t. These proteins are - Chloroplast encoded frxC 
(or chll) protein [3], FrxC is encoded on the chloroplast genome of some plant species, its exact function is not known, 
but it could act as an electron carrier in the conversion of protochlorophyilideto chlorophyllide - Rhodobaetercapsulatus 
proteins behl and bchX [4], These proteins are also likely to play a role in chlorophyll synthesis. There are a number 
of conserved regions in the sequence of these proteins, in the N~terminai section there is an ATP-binding site motif 'A' 

40 {P-ioop} and in the centra! section there are two conserved cysteines which have been shown, in mfH, to be the iigands 
of the 4Fe-4S cluster Two signatures patterns that correspond to the regions around these cysteines were developed. 
[0626] Consensus pattern: E-x-G-G-P-x(2HGA]-x-G-C-[AG)-G (C binds the iron-suifur center)- 
Consensus pattern: C)-x-L-G-!>-V-V-C -G-C5-f : -[AG3-x-P [C binds the iron-sulfur center]-- 

■>s { 1] Pau R.N Trends Biochem Sc.i. 14:183-186(1989} 

[2] Georgiadis MM., Komiya H., Chakrabarti P.. Woo D., Kornuc J.J., Rees D.C, Science 257:1653-1659(1992). 
[ 3] Fujita Y., Takahashi Y., Kohchi T, Ozeki H., Ohyama K,, Matsubara H Plant Moi. Biol 13:551-561(1989). 
[A] Burke D.H., Albert! M., Hearst J.E. J. Bacteriol. 175:2407-2413(1993). 

so [0627] 206. Ferritin iron-binding regions signatures 

Ferritin j'l,2j is one of the major non-herne iron storage proteins. It consists of a mineral core of hydrated feme oxide, 
and a multi-subunit protein shell which englobes the former and assures its solubility m an aqueous environment In 
animals the protein is mainly cytoplasmic and there are generally two or more genes that encodes for closely related 
subunits (in mammals there are two subunits which are known as H(eavy) and Uight}}. In plants ferritin is found in the 

55 chloroplast [3] There are a number of well conserved region in the sequence of ferritins Two of these regions to develop 
signature patterns were selected The first pattern is located in the central part of the sequence of ferritin and it contains 
three conserved glutamate which are thought to be involved in the binding of iron. The second pattern is located in the 
C-terminai section, it corresponds to a region which forms a hydrophtiic channel through which small molecules and 
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ions can gain access to the central cavity of the molecule this pattern also includes consened acidic tesidues which 
are potential metal-binding sites. 

[0628] uon^nsus pattern E-v-[hR]-E-^ t 2VE-[KR]-[LFHLIyMA3->.2|-0-N-x-R->-G-R [Tht 3 E's ate ( otentn! iron 
Itqandsj- 

s Consensus pattern: D-x{2HLtVMFHSTACHDHJ-F-[L!H£N]-xf2HFY3-l-x{6HLIVMHKN3 fThe second D and the E are 
potential iron ligands]- 

[ 1] Crichton R.R., Charloteaux-Wauters M. Eur. J. Biochem 184:485-506(1987). 
[2j Theii £■ C Annu Rev Biochem 66 289-, J .1o(1987i 
10 [ 6] Roland M , &iiat J -F Gaqnon J L^uth^t J -P M=issenet O Theil E C J Biol Chem 2^5 18339- U<344 

(1990). 

[0629] 207 !nt«i mediate filaments signature ^lament) 

Intermediate filaments (IF) [1,2,3] are proteins which are primordial components of the cytoskeleton and the nuclear 
*s envelope. They generally form filamentous structures 8 to 14 nm wide. IF proteins are members of a very large multigene 
family of proteins which has been subdivided in five major subgroups: - Type t: Acidic cytokeratins. - Type It: Basic 
cytokerafms, - Type III: 'Vimentin. desmm. glial fibrillary acidic protein fGFAP). penphenn. and plasticin. - Type IV: 
Neurofilaments L, H and M, aipha-internexin and nestin. - Type V: Nuclear lamins A, B1. and C. All IF proteins are 
structurally similar in that they consist of: a central rod domain comprising some 300 to 350 residues which is arranged 
so in eoiled-eoiled alpha-helices, with at least two short characteristic interruptions; a N-terminal non-helical domain (head) 
of variable length, and a C-termina! domain (tail) which is also non-helicai. and which shows extern** length ^ ariation 
between different IF proteins. While IF proteins are evolutionary and structurally related, they have limited sequence 
homologies except in several regions of the rod domain. A conserved region at the C-terminal extremity of the rod 
domain was used as a sequence; pattern for this class of proteins 
25 [06303 Consensus pattern: [IV3-x-[TAC!]-Y-[RKH)-x-{lM3-L-[DE)- 

[ 1] Quintan R., Hutchison C, Lane B. Protein Prof. 2:801-952(1995). 
[2] Steiner P.M., Roop D.R. Annu. Rev. Biochem. 57:593-825(1988). 
[ 3] Stewart M. Curr. Opin. Cell Biol. 2:91-100(1990}. 

[0631] 208. Flavodoxin signature 

Flavodoxins [1 ,B \] are electron-transfer proteins that function in various electron transport systems. Flavodoxins bind 
one FMN molecule, which serves as a redox-active prosthetic group. Flavodoxins are functionally interchangeable with 
ferredoxins. They have been isolated from prokaryotes, cyanobacteria, and some eukaryotic algae. The signature 
35 pattern for these proteins is derived from a conserved region in thfcir N-tetrnma! section, this region is involved in the 
binding of the FMN phosphate group. 

[0632] Consensus pattern: [LtV]-[L!VFY]-[FY]-x-[ST]-x(2HAGC3-x-T-x(3}-A-x(2HLIV}- 
[ 1) Wakabayashi S , Kimur3 K., Matsubara H . Rogers L J. Biochem. J. 263:981-984(1969). 
[0633] 209 Growth factor and cytokines receptors family signatures (fn3j 

40 A number of receptors for lymphokines. hematopoeitic growth factors and growth hormone-related molecules have 
been found [1 to 5] to share a common binding domain. Receptors known to belong to this family are: - Cytokine 
receptor common beta chain. This chain is common to the IL-3, IL-5 and GM-CSF receptors. - Cytokine receptor 
common gamma chain. This chain is common to the IL-2, IL-4, IL-7 and 1L-13 receptors. - Ciliary neurotrophic factor 
receptor (CNTf-'R) ■■ Erythropoietin receptor {tv.POR). ■ Granulocyte colony-stimulating factor receptor (G-CSFR). ■■ 

45 Granulocyte-macrophage colony-stimulating factor receptor alpha chain (GM- CSFRt - intefteukin-2 receptor be:ta 
chain (!L2R-beta). - interleukm-3 receptor alpha chain (1L3R). - !nterleuktn~4 receptor alpha chain (!L4R). - Interleukin- 
5 receptor alpha chain (ILSR) - !nterleukin-6 receptor (IL6R). ~ lnterleukin-7 receptor alpha chain (1L7R), - Interleukin- 
9 receptor (!L9R) - Growth hormone receptor (GRHR) - Prolactin receptor (PRLR). -Thrombopoettm receptor iTPOP} 
The conserved region constitutes all or part of the extracellular ligand-binding region and is about 200 amino acid 

so residues long In the N-termina! of this domain there are two pairs of cysteines known, in the growth hormone receptor 
to be involved in disulfide bonds. + .XXXXXXx. + j C € C C Extracel- 
lular XXXXXXX Cytoplasmic H -.XXXXXXX Transmembrane +- 

+ +— + Two patterns to detect this family of receptors were used. The first one is derived from the first N-terminal disulfide 
loop, the second is a tiyptophan-rich pattern located at the C-terminal extremity of the extracellular region 

ss [0634] Consensus pattern C-[LVFYR]-x(7 SHSTIVDNj-C-x-W [The two C's are linked by a disulfide bond]- 
Consensus pattern. [STGL]-x~W~[SG]-x~W~S- 

[ 1] Bazan J.F. Biochem. Biophys. Res. Common. 184:788-795(1989). 
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[ 2) bazan J F Proc Nat! Acad cci USA e~ G0o4-60J6 t WOO) 

[ 3] Cowman C Lyman S O Idzerda R L B^ckinann M P Patk L S Goodwin R G March C J Ti^nds Bic _iiem 
Sci. 15:265-270{1990V 

[4] cf Andrea A.D. . Fasman G.D.. Lodish H.F. Cell 56:1023-1024(19891 
s [ 5] d'Andrea A.D., Fasman G.D., Lodish H.F, Curr. Opin. Cell Bio!. 2:648-851(1990). 

[0635] 210 Phosphonbo^ylglycinainid- 1 fonnyltransfuras- 1 active hAv (Rvmyi_tr&ns>n 

Phosphonbos>lglycinamid<? formyltransf erase (EC I 1 I 2) tGART> [1] catalyzes the third step in de now purine bio- 
synthesis the transfer of a foim>l group to b' phosphoribosylglyrinamtde in highei euKaryotes GART ti, part of a 

to multifunctional ^n^ym^ f.oly( ef.tid-i that catal^es three of tht h ot f.unn-i bios> nth-isis If! bacteria plants and 
ytj^t GART it a monokine tiona! protein of ibout 200 amino-ai. id rt^iduts In tht* Escherichia oolt i»nzymi* an aspariic 
aud residue has be^n bho^n to be involved in th<= catalytic mechanism Thf i^gion around this active *.rtf if^iduf is 
well conserved in GAR'f ftom ptokaiyttn and eukat yotk sources and can te used .js j signature pattern Mrtmmalun 
formv'itetruhvTrofolate d^hvTrogenase iEC 1 5 1 05 [Z] is a rytosoh;**nzyrm* lesp^nsible for ttV NADP-depement de- 

»5 carboxylase reduction of 1 O-formyltetrahydrofolate into tetrahydrofolate It is a protein of about 900 ammo acids con- 
sisting of three domains: the N-termmal domain (200 residues) is structurally related to GARTs. Escherichia colt me- 
thionvl-tRNAformyltiansteiasetEC 2 1 2 '"^^ene fmt) (3 [is the enzyme tesponsibie for modifying the ftee ammt group 
of tne aminoacjlmoiety of methionyl-AtfMet) The centtai part of fmt seems to be evolutional y related to GART's acti\e 
site region 

20 [0636] Consensus (attorn G-x-[STM]-[l\ T}->-[FYWyQ]-[y MAT]-t-[DEVMj-<-[LiyM> }-D->-G- x(2|-[L!VT]-x(M- 
[LIVM] [D is the active site residue] - 

[ 1] [rtglese J.. Smith J.M., BenkovtcSJ. Biochemistry 29:6678-8687(1990). 
[ 2] Cook R J LlovdRS Wagiw C .I Biol Che-m 26C 4«6S-40-i t 19Q1) 
25 E 3) Guiiion J.-M. Mechuiam Y. Schmitter J.-M., Slanquet $.. Fayat G. J. Bactenol. 174:4294-4301(1992), 

[0637] 211 . G10 protein signatures 

A >e-nopus ptoiein knuv\n is G10 [1] has bo^n found (o !>=• highly eons-^ed in i vude tangs; of -nikaryotie spe-oie-s 
The function of G10 is stiii unknown. G10 is a protein of about 17 to 18 Kd (143 to 157 residues) which is hydrophiiic 
30 arid whose » -terminal half is rich if! cysteine 1 , and could be- involved in mei a l-bi riding As signaiuie pattern*, two of 
these cvsteine-nch segments were selected. 

[06383 Consensus pattern L-C-C-A-(KP]-C-K(4}-ED£J-v-N-v(4hC-A-C-R-v-P- 
Consensus pattern: C-x-H-C-G-C-[KRH]-G-C-[SAJ- 

[0639] ( 11 fVkGiew I. I. Dworkin-Rastl E Dworkin M B Rkhtet J 0 Genet Dev i 803-615) 198P} 
35 [0640] 212. G-protein alpha subunit 

[0641] G proteins couple receptors of extracellular signals to inti a cellular signaling pathwavs The C piotem alpha 
subunit binds guanyl nucleotide and is a w-iak GTPase Numbei of mtimbeis 19^ 

[1] Coleman D£ BerghutsAW Lee E Lmdei WE Gilman AG Sprany SR Science 1904 265 140C-1412 
40 [2] Hoo' G proteins o'Oik a continuing skiv Coleman OE Sprang SR Ti^nds BitKhftn &<.( 19^6 21 41-4-1 

[0642] 213 Glucose-C-phosphate dehyorogenase actts'e site tG6PDt 

Glucose ^-phosphate dehydrogenase (EC 111 4&> (GtPDt (1) catalyses the first step in the pentose pathwav, the 
ieductnn ot cil'icose-6-phoi.ph^ to glu-x nol.jctone 6-phosphate A lysine teiidue ru* b^en identified as. .tie active 
45 nuduophilt; associatf-d s-'!th the activity ufih^ enzyme The suqueno^ around this h.sin- 1 is totally eons> j r^ed ftoni 
bacterial to mammalian GBPD's and can be used as a signature pattern 
[0643] Consensus pattern D-H Y-L -G K-(tOKj JK i*. the active site residue]- 

[0644] [1]JefferyJ Peiss^nB Wood I Bergman T, Jeffeiy P Joern^ all H Em J Biochem 212 41-49(1903) 
£06453 ^ 4. GATA-type zinc finger aomain 
so The GATA family of tianscitption t^^tots 3i*> ptotems that bind to DMA sites with the consensus ^qu^nce i A TJGATA 
(A/G). found within the regulatory region of a number of genes. Proteins currently known to belong to this family are: 

- GA! 4-1 [1] (also known as Etytl GF-1 or NF-E t j svhith binds to the GATA r^oion <*f globm genes and othet oenes 
expressed in erythroid cells. It is a transcriptional activator which probably serves as a general 'switch' factor for eryth- 
roid dwiopnwit - GATA-2 [2] a tran&cri} tional acii^ator which r-igulaf-is -indofht;[m-! gen-i expressic n if! tndc th-ilial 

ss c-'llt. - G-iTA-2 [3] n transcnptional activator <vhirh binds to thu unhanc-'f of tht T-ce!l rocvptor alpha and de-its* gent^ 

- GATA-4 [4] a transcriptional acttvatoi evpiessed in endodermally detived tissues and neart - Drosophila protein 
panniei (oi DGATAa) yen* 1 pnn which art 5 at- a ie|. ie*i-or ot the a^ h ^et^-s<. ute romplf> (as-c) -B«.mbv> mou PCF1 
{5} which feguhite'Slhe'enpr^sion ot chorion qerie-s -Ca^ncthabdiiis. -ilegans eh-1 and th-2 transcnplionaLn'fK' : ikts 
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of genes containing the GATA region including vitellogenin genes [6] - Ustiiago maydts urosl [7\ a protein involved 
in the repression of the biosynthesis of siderophores. - Fission yeast protein GAF2.AM these transcription factors contain 
a pan of hi jhlv similar '2ine finger 'type domains, with the consensus sequence u-<2-C-xl7-0-<2-C Some either proteins 
contain a single zinc finger motif highly related to those ?f the GAT A tiansi ration factors These pi?tetns art- - Oto- 

s sophila box A-bmdmg factor (A6F) (also known as protein serpent (gene srp)t which may function as a transcriptional 
ai.tK .ttu protein ana m<*\ pld\ a kev tote m the oio,atiogen<rsis of the fat body ■ E: met i cell. j mdtil.jni, .jieA [8] a ttan- 
scriptoria! aetiv;<toi -."huh m^di^tes mlroovn m> j t3boiite te-pn^sion - Neuiospota ci issj rut-2 [ f »] ;< transcriptional 
activator v,ntch turns on the expression ot genes coding for enz>mes required for the use of a vanet\ of secondan, 
nttioyen souices during conditions of nitiogen limitation ■ Neuiospoia aassa white collar proteins 1 and.?(WG-1 and 

to v\'C-2) whk'h control expression of lighf-iequiafed genes - bae>' hat omyi.es ee'ttvisiae D^L8 1 ioi ' IGA43) a negative 
nitioq> j n mqu! riory ptote-in -S;nxh;<ioinyc> j soe-tt;visi3eGLN3 a positive mtroovn regulator^ protein -S i<-ch iromyce-s 
cerevistae GAT1. - Saccharomyces cerevistae GiF3. 

[0646] Consensus pattern C-v[DN] 0 <t4,M-[S1| <t.?)-V\-[HRj-[RKi-<(? i-[Ghij- vMl- C-N-[AS]-C fl he foui C's .tie 
zinc ligands] 

?5 

[ 1) Trainor CO., Evans T.. Felsenfeld G., Boguski M.S. Nature 343:92-96(1990). 
[ 1 ] Lee M E TemizerO! Ciiffotd J A Querteimous T J Biol Chetn 1 66 161C8- I61v2< 1P91 > 
[ 3| Ho i -C VorheesP Mann N Oal le>> B h Tsai S -F Orktn e H Leiden J M EMBOJ 1U 1 137- f W2( 1 j 
[4] Spieth J , Shim YH.. Lea K.. Conrad R . BfumenthalT. Mol. Cell. Biol, 11:4651-4659(1991). 
20 [ 5] Df« t J R Ske-ikv I atrou h J Biol a^n 289 106^0-1066^,1994} 

[6] Hawkins M.G.. McGhee J.D, J, Biol. Chem, 270:14866-14671{1995L 

[ "j C PC Wang J \u P Leong c A Mc£\oy J L Mol Cell Biol U 7001-" 1D0« 19C.r1 

[ 8] Arst H.N. Jr.. Kudla B.. Martinez-Rossi N.M.. Caddick M.X.. Sibley S,. Davies R.W Trends Genet. 5:291-291 

(1989). 

25 [ 93 Fu Y-H.« Marzluf GA Mol. Cell. Biol. 10:1056-1065(1990), 

[0647] 2 15 Glutamme amide liansfera^eh <.las.s-l active site (GATaset 

A larou qtoup of biosvnrhc Ik «n:vm«^ ar> j ablt* to catalyze the- ttmoval of the immonia gtoup from glutemmu ;<nd the n 
to transfei this group to a substrate to form a new caibon-nitiogen gioup Tnis catalytic activity is known asglutamine 

30 amide hanst'era*^ (GAIase) (tC 2 4 2 -i [1] fhe t^Alase domain e>ists either as a separate- f.oly( e^tidic subumt or 
as part of a laiget polvp^ptioe fused in dtffwent ways to a synthase domain On the basis <4 sequence similarities two 
classes of GATase domains have been identified [2,3]: class-ltalso known as trpG-type) and class-1! (also known as 
putF-typf | Class-I (.-.ATast- domains haw b^en found in the follOvMny tnzimts - The second component of ^nthn- 
niLtte synthase i ASi (EC 4 1 Ji ?.' ) J4) AS c^t^ly.Tes the biosvnthesis ot anthtanilate fiom chonsmate .tnd glut.tmine 

35 A.3 is gtinurjlly ;i dim> j r^ enzyme thi* first component ojn synthesize anthranil?<tt; usino jrnrnonia either than 
glutamme. whereas component II provides the GATase activity, in some bacteria and m fungi the GATase component 
of AS is ( art of a multifunctional protein that also catalyzes ether ste-ps ot the biosynlh^si 1 . of ttyptophan -Ttn» s> : 'Cond 
component of 4-^mmo-4-deoyvchorisrTvte <ADC) svnth^se iCC 4 H -) ^ dim^ric ftok^iyotr ^nzvm-* th3t function 
in the pathway that catalyzes the biosynthesis of para-aminobenzoate (PABAj trom chonsmate and glutamme. The 

40 b<jcondconipenenttg<=nepaLAipiovidestheGATase ^^tiutv[4] -cTP synthase [Ek, 6 3 4 2> CTPbinthab<;(,atalvCteS 
the final reaction in the biosynthesis of pynmidtne. the Af P-depsndsnt formation of C I P from U I P and glutamme. C) P 
synthase is a single chain enzyme that contains two distinct domains: the GATase domain is in the C-terminal section 
[2], - GMP synthase (giutamine-hydrolyzingj (EC 8 3 5,2;. GMP synthase catalyzes the ATP-dependent formation ot 
GMP fiom <rtnttxsm? 5 -phtsphate and gkit.jmine GUP synth^s? is a iingte ^h.jtn enzvme that contains two distinct 

45 ooinams the GATase domain is m the N-tennirul se etion [S] - Giutaimn^-d-^nd^nt catbamoyi-phosphjt- 1 synitias^ 
(EC 6 3 5 •?) (GD-CPSase) an enzjme involved in both aiginine ano pjnmidine bifsynthesis and which cataljzesthe 
aXP dependent formation of caibamoyi phosphate fiom glutamme and oaibon dio<ide In baotena GD-CPSai.e com- 
posed ?f t»vn iuounits the Luge thuin (gene ;<jtB> provides th^ CFSase activity while the small chain i.gene caiA' 
provides the GATase activity In yeast the enzyme imolved m arginine biosynthesis is also composed of tv\o subumts 

so CPAl (GATase) and cPA2 tCPSaset In mosteuk^ivoteb thn fnstthr<;e steps of puunidme biOb/nth^bis aie .^t^Kz^d 
by a large multifunctional enzvme (called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals). The GA- 
Tase domain is located at the N4ermtnal extremity of this polyprotem {6}.. - 

Phosphoribosylformyiglycinamidine synthase I! (EC 6.3.5.3), an enzyme that catalyzes the fourth step in the de novo 
biosynlhe'Si 1 . of punne'S Iri somt; h\ eeie 1 . of bacteria FGAM ^yrithase II is eoinposed ot l\v:> subunit 1 . a ^mall ehain 
55 lOuru* purQt which provides ih^ G-iTjs- 1 jotivity ano a iarq- 1 ofutn (g> j n^ purLt which pfo>.'id> j s the amm itor activitv - 
Tne histidine amiootransferase hisH an enz\ me that catalyzes the fifth step m the oiosyntnesis of histidme in prol anj- 
otet- In the second compon<rnt of AS a o^teine has bt-^n ^hOv%n [~] to be e^^ntialtoi th<r ^inidotfant.tei;i->e a<tMt\ 
The sequence ateund this residue is well eonservid iri all itn» above* GATase domairis and can b<* used as a signatute 
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pattern for class-! GATase,- 

[0648] Consensus pattern: [PASHLiVMFYT]-[LIVMFY3-G-[LIVMFY]-C-[LIVMFYN3-G-x-[QEH|- x-[LIVMFA] [C is the 
active site residue]- 

s [1] Buchanan J.M. Adv. Enzymoi. 39:91-183(1973). 

[23 Weng M.. Zaikin H. J. Bacterid. 189:3023-3028(1987). 

[ 3j Nyunoya H . Lusty C.J J Bio!. Cham 259:9790-9798(1984) 

[4] Crawford I. P. Annu. Rev. Microbiol. 43:567-600(1988). 

[ 5] Zaikin H., Argos P., Narayana S V L . Tiedeman AA, Smith J.M, J Bio!. Chem, 260:3350-3354(1985) 
10 { 6] Davidson J.N , Chen K.C . Jamison R S . Musrnanno L A.. Kern C B BioEssays 15:157-164(1993) 

[ 73 Tso J.Y., Henncdson MA, Zaikin H. J. Bio!. Chein. 255:1451-1457(1980), 

[0649] 218 Giutamine amidctransferases class-i! active site (GATase_2) 

A large group of biosynthetic enzymes are able to catalyze the removal of (he ammonia group from giutamine and then 
*s to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as giutamine 
amtdotransferase (GATaset (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptide subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,33. class-liaiso known as trpG-type) and class-!! (also known as 
purF-type) Class-! I GATase domains have been found in the following enzymes' ■■ Amido phosphoribosyltransferase 
so (giutamine phosphoribosyipyrophosphate arnidotrsnsferase) (EC 2 4 2 14 1. An enzyme which catalyzes the first step 
in purine biosynthesis, the transfer of the ammonia group of giutamine to PRPP to form 5-phosphoribosylamine (gene 
purF in bacteria. AOE4 in yeast). - Glucosamine-fructose-6-phosphate aminotransferase (EC 2 61 16) This enzyme 
catalyzes a key reaction in ammo sugar synthesis, the formation of glucosamine 6-phosphate from fructose 6-phos- 
phate and glutamme (gene gfmS in Escherichia coll nodM in Rhizobium, GFA1 in yeast) - Asparagine synthetase: 
25 (glutamine-hydrolyzing) (EC 6.3.5 4 }. This enzyme is responsible for the synthesis of asparagine from aspartate and 
giutamine. A cysteine is present at the N-termina! extremity of the mature form of all these enzymes. The cysteine has 
been shown, m amido phosphonbosyltransferase 34] and in asparagine synthetase [5] to be important for the catalytic 
mechanism. 

[0650] Consensus pattern. <x{0.11 )-C-(GS]-[iV]-[LiVtV!FYW3-[AG3 [C is the active site residuej- 

[ 13 Buchanan J.M. Adv. Enzymoi. 39:91-183(1973). 
[23 Weng M„ Zaikin H. J. Bacteriol, 169:3023-3028(1987). 
[33 Nyunoya H.. Lusty C.J. J. Bio!. Chem. 259:9790-9798(1984). 
[4] van Heeke G„ Schuster M. J. Biol Chem. 264 5503--5509(1989}. 
35 [ 5j Vollmer S J , Swteer R L : Hermodson M A . Bovver S G , Zaikin H. J. Biol Chem. 258 10582-10585(1963) 

[0651] 217. GDP dissociation inhibitor (GDI) 

[0652] [1] Schalk I, Zeng K. Wu 3K. Stura EA, Matteson J, Huang M : Tandon A, Wilson iA. Balch WE, Nature 1996, 
381:42-48. 

40 [0653] 218 Oxidoreductase family (GFO_!DH_MocA) 

[0654] This family of enzymes utilise NADP or NAD This family is called the GFO/IDH/MOCA family in swiss-prot 
[0655] (1] Kingston RL, Scopes RK. Baker EN. Structure 1996.4:1413-1428. 
[0656] 219. GHMP kinases putative ATP-binding domain 

The following kinases contains, in their N-terminal section, a conserved Gly/Ser-rich region which is probably involved 
45 in the binding of ATP [1] These kinases are listed below. - Ga!actokinase (EC 2.7.1 Qj - Homoserine hnase (EC 
2 7 1 39) - Mevalonate kinase (EC 2 7 1 36) ~ Phosphomevalonate kinase (EC 2 7 4 2) This group of kinases was 
called 'GHMP (from the first letter of their substrate) 

Consensus pattern [LIVMHPKJ-x-fGSTAHfO.D-G-L-EGSj-S-S-EGSAj-fGSTACj- 
[0657] 3 1] Tsay Y H t Robinson G W. Mot Ceil. Biol 1 1 620-631 ( 1991) 

so [0658] 220. Glucose inhibited division protein A family signatures (GIDA) 

Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70Kd whose function is not yet known and 
whose seguence is highly conserved It is evolutionary related to yeast hypothetical protein YGL236C. Caenorhabditis 
elegans hypothetical protein F52H3.2 and a Bacillus subtilis protein called gid (and which is different from B.subtiiis 
gidA) Two highly conserved regions were selected as signature patterns Both regions are located in the central region 

ss of the protein. 

[0659] Consensus pattern. [GS]-[PT]-x-Y-C-P-S-[LIVM3~E->f-K-[LiVM3-v-[KR]- 

Consensus pattern: A-G-Q-x^NT3-G-x(2)-G-Y-y-E-{SAG}{3)-[QS}-G-[LIVIM](2)-A-G-lL)VMT]-N-A- 

[0660] 221. (GLFV_dehyd rog ) 
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Glu / Leu / Phe / Vat dehydrogenases active site 

- Gkilamat* dehydrogenases (EC 1 4 1 2 Ef ! 4 ! 1 and EC 1 4 1 4) <GiuDH) ate-' enzyme*, that catalyse theNAC- 
or NADP-iepenoent reversible deaminahon of giutamate into alpha-l-^tngiutaiate [1 2] GlttOH isozymes aie gen- 
s erally involved v\ith other ammonia assimilation or glutamate catabolism 

I. eucinc- dehydt ogen .t se \t-C 1 4 1 P} tJ.euDH) is a NAD-o'epenoent pnzvme that c.jt.jiyzes the tevetsible o'e.jmi- 
nation of leucine and s-^rai other alrphatu amino acids hi their keto analogues [3] 

Phenylalanine dehydrogenase (EC 1 .4. 1 .20) (PheDHi is a NAD-dependent enzyme that catalyzes the reversible 
deatnidatjon of L-phenyia!anme into pheny (pyruvate [4]. 
10 - valine dehydioqerpise (EC 14 18* (ValDi-h n, a NADP-d*pendenf enzyme that o=ital)Zes ihe r-^ersible denu- 
dation of L-valme into 3-methyl-2~oxobutanoate [5]. 

[0661] These dehydrogenase* ..ire stitktur.jiiy and function .tlly related A >.ons<rrved lysine residue located m a gl\ 
cine-rich reqion has been implicated in the catalytic mechanism. The conservation of the region around this residue 
*s allows the derivation of a signature pattern for such type of enzymes. 

[0662] Consensus pattern[LiV]-\C t -G-G-hAG3-h-x-[GV]-<( JKL'NST ]-[PL] [K is the aui\e site lesiduoj 3e.,u«-nc^ 
f^nown k t elong to this oLtss Detected hv the pattern Al. L 

[0663] Note ail knov.n sequences from this family ha^e Pro in the last position of the pattern v.rth the exception of 
yeast GiuDH which as Leu 

[ 1] Button hL eaket PJ RiceDW SkllmanTJ Eur J Biochem 200 851 -8?9< 1992) 
[ 2] Benachenhou-Lahta N Fortene F Labedan B J Mot E\,o! 3c o35~34Gt1C93) 

[CjlMag^S Tanizao'a k E^aki H Sakamok V Ohshuna T Tanaka H boda k Bic -henitstij 27 "Oftf-yOCi 
(1988). 

25 [4] Takada H„ Yoshimura I. Ghshima T.. Esaki N.. Soda K. J. Biochem. 109:371-376(1991). 

[ "ij Hutchinson C P Tang I. J C-Sauenoi 1/641 ">6 4 1 Sof 1 99 i ■ 

[0664] 222 GMC uvidore due t3S->s signatures 

Tne following RaD fla\ opiotems o^idoieductases <na\e been found [i 2] to oe e\olutionary related These enzymes 

30 v\hKh are called 'GtVI> o^dor^duotases an 1 li&teo be low - Glucose oxidase (E> 113 4} iGOX) from Aspergillus mgei 
Reaction catalysed glucose + oy\ gen delta-oiuconoiacton^ + hvTrogen p^ioytd^ - Methanol oytdus^ i EC 1 1 3 13) 
(MOX) from fungi. Reaction catalysed: methanol + oxygen -> acetaldehyde + hydrogen peroxide. - Choline dehydro- 
genase (EC 1 t 99 1 } (CHC) from bacterid Ration catalysed choline + unknot, n 3v.c<;ptoi bftatne ^ceteldehjcte 
■i reduced acceptor. ■ Glucose dehydrogenase (GI..D) (EC 1J. : 99.10) from Drosophila Reaction catalyzed glucose -f 

35 unknown acceptor delta-gluconolactone + reduced acceptor - Cholesterol oxidase (CHOD) ( EC 1 .1 .3 6) from Brevi- 
bacterium sterolicum and Streptomyces strain SA-COO. Reaction catalyzed, cholesterol -t- oxygen ■■> choiest-4-en- 
3-one + hydrogen peroxide - AlkJ [3j. an alcohol dehydrogenase fiom Pse-udomonas oleovorans, whtch converts 
aliphatic medium-chain-length alcohols into aldehydes. This family also includes a lyase: - {R)-mandeionitn!e lyase 
(EC 4.1 .2. 10) (hydroxy nitriie lyase) from plants [4], an enzyme involved in cyanogenis, the release of hydrogen cyanide 

40 from injured tissues These enzymes are proteins of size ranging from 556 <CHD> to 664 (MOX) amino acid residues 
which share a number of regions of sequence similarities. One of these regions, located in the fj-termina! section, 
corresponds to the FAD ADP-binding domain. The function of the other conserved domains is not yet known; two of 
these domains were selected as signature patterns. The first one is located in the N-rermmal section of these enzymes, 
about 60 residues, after the ADP-binding domain, while the second one is located in the central section. 

45 [0665] Consensus pattern: [GAHRKN3•-x•-{L!V]-G{2HGST3(2)-x-(L!VM]~N~x^3)-[FYWA3-x(2HPAG3-x{5HDNESH^ 
Consensus pattern: [GS]-[PSTA]-x(2HST3-P-x-[LiVM3(2)-x(2)-S-G-[LlVM3-G- 

[1]CavenerD.R J Mol. Biol 223:611-314(1992). 
[23 Henikoff 3., Henikoff J.G. Genomics 19:97-107(1994). 
so [ 33 van Beifen J.B., Eggink G, Enequist H., Bos R., Witholt B. Mol. Microbiol. 6:3121-3136(1992). 

[43 Cheng i.R, Poulton J.E Plant Cell Physiol. 34:1139-1143(1993). 

£06663 223. (GMP„synt„C) 
Glutamine amidotransferases ciass-l active site 
55 [0667] A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine 
and then to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as 
glutamine amidotransferase (GATase) (EC 2.4.2.-') [13. The GATase domain exists either as a separate polypeptide 
suburut or as part of a larger polypeptide fused in different ways to a synthase domain On the basis of sequence 
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similarities two classes of GATase domains have oeen identified [2 3] class-i talso knov.n as trpG-typej and class-li 
talbo known ab puiF-t>pe> Clasa-I O^se domains h=w<= L^en found tn th<= following enzymes 

The seiom component of anthtanilate synthase ^AS) (EC 4 1 3 27i [4] A3 catalyzes th^ fciosynthesis of onthia- 
s mlate- from chonsmate an<1 glutammc A3 is generally a dimcnc enzyme the- first component can synthescc an 

thiamine using ammc ma tathet th.tn csfut^nunt- Aheieas component II ptovides the GA'Ltse activity In some 
bactc-na and in Innqi thu GATass; component o! A3 is part of 3 multifunctional protein that Jli.ii cat iiv^s othc-t 
steps of the biosynthesis of tryptophan. 

7 he second component of 4-amino 4 deov\ chonsmate 1 ADC ) synthase ft.C 4 1^ ta dimenc piokaiyotic en?yme 
to that function tn \ athway that cafaly^'S tht hio&yrHh^&is of para-aminobenix ate iPABAt from (.hoi is ma to and 

glut imme Thu second component ig^nt; pabA) piovidc- a thu GATase- activity [4] 
■ CTP synthase (EC 6.3.4.2). CTP synthase catalyzes the final reaction in the biosynthesis of pynmidme, the ATP- 

dependent toimation cf C'iP frcm UTP and qlutamme C'iP synthase is a single i.h.jm en.7>me that >.ontrtins two 

distinct domains tne GATase domain is in the C-teimma! tectum [2] 
*s - GMP synthase tglutamme-hydrolyzing) (EC 6.3.5.2). GMP synthase catalyzes the ATP-de pendent formation of 

GMP from xanthosme 5'-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two 

distinct domains; the GATase domain is in the N-termma! section [5], 

Glutamine-aependent catbamoyi-phosohate synthase (EC 6 3 5 5> tGD~CPSabe> an en^me involved in both 
aigimne and pyrtmtdine biosynthesis and which catalyses the ATP dependent formation of caibamovl phosphate 

so Trcm qlutamiru : ' : md carton dtotdf 1 In bacteria GCj-<"PSase is composed of two suburuts tht L«gt; chain (.jene 

caiB) provides the CPSus^ actMtv while the small :hain igene carA) provides the GATase a:tK'ity In y^astthe 
enzyme involved in argimne biosynthesis is also composed ottA»o subunits CPA! tGATase^ and CPA2 iCPSasel 
In most HLikan> c t<=s the first tin en steps of pynmidirtf biosynthesis art- catalysed by a laige multifunctional enzyme 
u ailed URA2 in w ist rudimentary in Dtusuphil;< and CAD in mammals) The- GATase domain is located atih<* 

25 N-termina! extremity of this polyprotem {6], 

Phosphonbosylformylglvcinamidme synthase II (EC 6.3.5.3). an enzyme that catalyzes the fourth step in the de 
no"0 hn'synthsr-sis, of purines in s^me species, of bacteria FGaM svnthase 11 is composed of too subunits a small 
chain igctit! purQi which prrwid-^s the* GATase actMt^ md a largt chain igctit! purLi which pro>.'id> j s the* animator 
activity. 

30 - f ht; histidme amidotiansferase fusH an en2yme ihat catal>.:es the filth step in ttu :i bic bvnfhe bis oi hibfidtno tn 
prokarvotes. 

[0668] InthtsSfCond component <.f AS a ^ysttinehas hten shown [ _ ]to be essential forthc; amidotr^nbfciase activity 
The sequence aiound thii lesidue is well conserveo in all the above GA'f.jse ootn.jini rind tan be useo 3i n siynattiie 
35 partem for class-! GATase. 

[0669] Consensus pattemlPASHLIVMm^iVMFYJ-G^LIVMFY^LIVMFYNHS-x-EQEH)- x-fLiVMFA] [C is the 
active ferie r^sidut;] SoqueriCts known to belong to thib class dtittcted t y the p : itiein ALL e^wpt fed 6 sequencer 
[0670] Note in the fust position of the c attorn Pro is founo in all r^ses except in th* s!im<- moid GD-CPSas-* 'vh-^r^ 
it is replaced by Ala. 

[ 1 j Buchanan J M Adv tnzymol 3Q 91-18 k 19 "3i 
[2jWengM Zalkin H J Bactniiol 10? 302^-?028( 10S~) 
[ 3] Nyunova H.. Lusty C.J. J. Biol, Chem. 258:9790-8798(1 984). 
[4] Crawford I. P. Annu. Rev. Mtorobiol. 43:567-600(1989). 
4S [5] Zalkin H Argos P NarayanabVL Tinman A A Smith J M J bio! Ch^m 2o0 ^50-^:54(193^. 

[ 6j Da\ idb?n J N Chen K C Jamison R S Musmann^LA i em C b Bi^Es^ajs t5 1C7-1640^ 3) 
[ 7] Tso J, Y.. Hermodson M.A , Zalkin H. J. Biol Chem. 255:1451-1457(1980). 

[0671] 224 Glutathione perovtdases signatures (GSt IPm 

so Glutathione pei oxidase fEC 111 ! V> iGSH^j [1 2] is =in t-nz/me that catalyses the induction <.f hydro <\pero>id<;b 
b\ glutathione Its mam function is to prefect ag.jtmtthe damayinji effect of endooenousl\ fotmed hydiox\petoM-jes 
In highw ^ertebiates at l^a^t f?ut toiins <'f GSHF> aie l-n^wn to ^<.ist a ufciquitous cytosolic form (GSHP>-H a gas- 
trointestinal cytosolic for (GSHPy-Gh {<>} a plasma secreted form tG&HPy-P} [4] and a eptdid>mal secretory form 
<G"-"HP*-EP) In addition k these charack nzod forms hie s^'qu^rKe of a protein of unknown function {5} has be^n 

ss shovi n to bi* evolutionary related to thoac- oi GPHP^'t In Hian?<i n> j m?<tucs; paraaifes such as Brugia p ihangi th> j major 
soluble cuticular piotein kno\-n as gp29 is a secieted GSHPs which c?uld provide a mechanism of resistance to the 
immune reaction of the mammalian host by neutralizing the products of the oxidative burst or leukocytes [6] Escherichia 
coliprokinbtuE a p^rtplasmic piotein involved in lhi : 'hansf.orl ofMt : mnn B12 is alsoevoluttcnaiy telaitid to GSHF>'s 
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the significance of this relationship is not yet clear. Selenium, in the form of selenocysteme [7] is part of the catalytic 
site of GSHPx. The sequence around the selenocysteme residue is moderately well conserved in GSHPv's and the 
related proteins and can be used as a signature pattern. As a second signature- for this family of proteins a highly 
conserved octapeptide located in the centra! section of these proteins was selected 
s [0672] Consensus pattern: [GNHRKHNFyC]-x-[L!VMFCHLiVMF3f2)-x-N-[VT|-x-.[STC3-x-C-[GA]-x-T [C is the active 
site selenocysteme residue] 
Consensus pattern [LiVHAGD]-F-P-[CS]-[NG]-Q- 

[ 1) MannervikB. Meth. Enzymcl. 113 490-495(1985). 
10 [ 2} Mullenhach G T , Tabriz A , Irvine B D , Bell G I . Tamer J A . Hallewell R.A Protein Eng 2 239-246(1988). 

[ 3| Chu F.F., Doroshow J.H., Esworthy R.S. J. Biol. Chein. 268:2571-2576(1993), 

[4] Takahashi K.. Akasaka rVL Yamamoto Y. , Kobayashi C. Mizoguchi J., Koyama J. J. Biochem. 108:145-148 
(1990). 

[5] Dunn D K„ Howells D.D., Richardson J , Goldfarb PS Nucleic Acids Res. 17.6390-6390(1989} 
*5 [6j Cookson E.« Blaxter M.L, Selkirk M.E. Proc, Natl. Acad. Sci. U.S.A. 89:5837-5841(1992). 

[ 7] Stadtman T.C. Annu. Rev. Biochein. 59:111-127(1990). 

[0673] 225 .{GST) 
Glutathione S-transferases 

so [0674] Function 1 conjugation of reduced glutathione to a variety of targets. Also included in the alignment, but are 
not GSTs 8-crystalhns from squid Similarity to GST was previously noted. Eukaryotic elongation factors 1 -gamma. 
Not known to have GST activity; similarity not previously recognized. Supported by HUM and manual alignment in- 
spection HSP26 family of stress-related proteins, including au*in-regulated proteins in plants and stringent starvation 
proteins in E. coli. Not known to have GST activity: Similarity not previously recognised. Supported by HMM and manual 

25 alignment inspection. Alignment spans entire protein. 
[0675] 226 GTP1/OBG family signature 

A widespread family of GTP-binding proteins has been recently characterised [1.2]. This family currently includes: - 
Mouse and Xenopus protein DRG - Human protein DRG2 - Drosophila protein 128up. - Fission yeast protein gtpl - 
A Haiobacterium cutirubrum hypothetical protein in a nbosomal protein gene cluster. - Bacillus subtilis protein obg, 

30 Obg has been experimentally shown to bind GTP. - Escherichia colt hypothetical protein yhbZ - Haemophilus influenzae 
hypothetical protein HIQ877. - Mycoplasma genitaltum hypothetical protein MG384. - Yeast: hypothetical protein 
YAL038c (FUN11 1 - Yeast hypothetical protein YGR173w. - Caenornabditis eiegans hypothetical protein C02F5.3.The 
function of the proteins that belong to this family is not yet known. They are polypeptides of about 40 to 48 Kd which 
contain the five small sequence elements characteristic of GTP-binding proteins [3]. As s signature pattern the region 

35 that correspond to the ATP/GTP B motif (also called G-3 inGTP-bmding proteins) was selected. 
[0676] Consensus pattern: D-{LIVM}-P-G-[LIVM){2HDEYH<3N]-A-x(2)-G-x-G - 

[ 1] Sazuk3 T, Tomooka Y., Ikawa Y Noda M., Kumar S. Biochein. Biophys Res. Commun 189:363-370(1992) 
[2] Hudson J.D., Young P.G. Gene 125:191-193(1993). 
40 [ 3j Bourne H.R., Sanders D.A., McCormick F. Nature 349:117-127(1991). 

[0677] 227. fGTP_EFTU1) 
ATP/GTP-bmdirtg srte motrf A (P-loop) 

[0678] From sequence comr. ^n^ons and cryst^liogiafhic d<*U analysis it h.js been shown (12 5 4 6 6[that an ,jp- 
■*s preoiablu proportion of pioieins that bind ATP or GTP share a nuinher of more or Itss lonstiiwd sequence motifs 
The best conserved of these motifs is a glyctne-nch legion ^nich typically foimb a fle/ible loop between a beta-stiand 
and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
genially referred to as ih* 'A' consensus ^queru-e [1] oi th* 'P-locp' [5] Thert; ate numerous ATF- or GTP-binding 
proteins in which the P-ioop is found Listeo belov\ are a numoer of protein families for whicn the relevance of the 
so presence of such motif has been noted: - ATP synthase alpha and beta subunits (see <PDQC00137>1 - Myosin heavy 
chains ■ hinesm heaw i-h.jrns .jrui kinesin like proteins (see -PDOC 00343>) ■ D\namins and dvnamm-like proteins 
f>ee --PDOC00362> , i - Guonvlate l-inose (see ^PDOCOOSAjM - Thymidine kinase isee <FDOCOOo24~-) -Thymr- 
dylate kinase (see <PDOC01034>). - Shikimate kinase (see <PDO€0G868>). - Nitrogenase iron protein family (nrfH/ 
fr*n fc.^ cpciOf - ATP-bindiny protons, involved in 'active transport' tABf" iranspc iters s [7] (see 

ss V-PQOC00185-1 - DN-i and RIMA hdic ises [8 9 10] - GTF-bindmg elongation factors ,EF-Tu EF-1aipha EF-G EF- 
2 etc i ~ Ras family of GTP-binding proteins tRab Rho Rao Pal Vptl SEC4 etc ) - Nuclear crotetn tan tbee 
-pnOC008f"9: ) - APP-nbosylation faotois family ipp* 1 •= FT' 'C0U781- 5 > - Bartwial dnM prttnn . ->ee --:PDOC00" t > t 
- Bacterial r*cA protein (see *.PDCX,00131:-) - Bacterial r*cF protein (set -;PDOC005'39> , t - Guanine nucleotrdt;- 
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binding proteins alpha subunits (Gi, Gs, Gt. GO, etc.). - DNA mismatch repair proteins mutS family (See 
<PDQC00388>). - Bacterial type II secretion system protein E (see <PDQC00567>).Not ai! ATP- or GTP-binding pro- 
teins are pteked-up by this motif. A number of proteins escape detection because the structure of their ATP-bindtng 
site is completely different from that of the P-loop. Examples of such proteins are the E 1-E2 ATPases or the glycolytic 
s kinases In other ATP- or GTP-binding proteins the- flexible loop exists in a slightly different form; this is the case for 
tubulin?, or protein kinases A special mention must be reserved for adenylate kinase, in which there is. a single deviation 
from the P-loop pattern: in the last position Gly is found instead of Ser or Thr 

- Consensus pattern: jAGj-x(4)-G-K-j'STj" 

10 

{ 1) Walker J. E, , Saraste M., Runswick M.J., Gay N.J. EM BO J. 1:945-951(1982), 
[2j MollerW., Arrtons R. FEBS Lett. 186:1-7(1985). 

[3j Fry D.C, Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1988). 
[43 DeverT.E., Glynias M.J., Merrick W.C. Proc. Natl. Acad, Sci. U.S.A. 64:1814-1818(1987). 
*s [ 5] Saraste M., Sibbaid PR., Wtttinghofer A. Trends Biochem. Sci. 15:430-434(1990), 

[8] Koonin E.V. J. Mo!. Biol. 229:1165-1174(1993). 

S H Higgins C.F., Hyde S.C., Mimmaok MM . Gileadi U.. Gil! D.R., Gallagher M P J. Bioenerg. Biomembr. 22 
571-592(1990}. 

[ 83 Hodgman TC. Nature 333 22-23(1988} and Nature 333-578-578(1988) (Errata) 
20 [93LinderP.LaskoP,AshhurnerM.,LeroyP,Nie!senPJ . NishiK .SchnierJ . Slonirnski PP Nature 337 121-122 

(1989). 

[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., BlinovV.M. Nucleic Acids Res, 17:4713-4730(1989). 

[0679] GTP-binding elongation factors signature (GTP_EFTU2) 

25 Elongation factors [1 .2] are proteins catalyzing the elongation of peptide chains in protein biosynthesis. In both prokary- 
otes and eukaryotes, there are three distinct types of elongation factors, as described in the following table: 

Eukaryotes Prokaryotes Function 

EF-1 alpha EF-Tu Binds GTP and an aminoacyl-tRNA; deliv- 
ers the latter to the A site of ribosomes EF-1 beta EF-Ts Interacts with EF-la/EF-Tu to displace GOP and thus allows 

30 the regeneration of GTP-EF-la EF-2 EF-G Binds GTP and peptidyl-tRNA and translocates the latter from the A site- 
to the P site. — - -The GTP-bmdtng elongation factor family also 

includes the following proteins: - Eukaryotic peptide chain release factor GTP-binding subunits (3j. These proteins 
interact with release factors that bind to ribosomes that have encountered a stop codon at their decoding site and help 
them to induce release of the nascent polypeptide. The yeast protein was known as 8UP2 (and also as SUP35, SUF12 

35 or GST1) and the human homolog as GST1-Hs. - Prokaryotic peptide chain release factor 3 (RF-.3) (gene prfC) RF- 
3 is a class-!! RF, a GTP-binding protein that interacts with class I RFs (see <PDOC0Q607>) and enhance their activity 
[4] - Prakaryofic GTP-hindinc; protein lepA and its homolog in yeast (gene GUF1) and in Caenorhabditis elegans 
(ZK1236. 1). - Yeast HBS1 [5]. - Rat statin 81 [63, a protein of unknown function which is highly similar to EF-1 alpha. - 
Prokaryotic seienocysteine-specific elongation factor seiB [73. which seems to replace EF-Tu for the insertion of se- 

40 lenocysteine directed by the UGA codon. - The tetracycline resistance proteins tetM/tetO [8,9] from various bacteria 
such as Campylobacter jejuni. Enter ococcus faecalis. Streptococcus mutans and Ureaplasma uiealyticum Tetracycline 
binds to the prokaryotic ribosomai 303 subunit and inhibits binding of aminoacykRNAs These proteins abolish the 
inhibitory effect of tetracycline on protein synthesis. - Rhizobium noduiation protein nodQ [10]. - Escherichia coli hy- 
pothetical protein yihK [11]. In EF-1-alpha, a specific region has been shown [12] to be involved in a conformational 

45 change mediated by the hydrolysis of GTP to GDP. This region is conserved in both EF-laipha/EF-Tu as well as EF- 
2/EF-G and thus seems typical for GTP-dependent proteins which bind non-inrtiatortRNAstothe nbosome The pattern 
developed fortius family of proteins include that conserved region. 

[0680] Consensus pattern: D-{KRSTGANC!FYW]-x(3)-E-[KRAQ]-x-[RKQD3-[GC]-[IV!V!K3-EST3- [!V3-x(2)-[GSTACK- 
RNQ3- 

[ 13 Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- York (1988). 
[2] Moldave K. Annu. Rev Biochem 54 1109-1149(1985} 

[33 Stansfield I., Jones K.M., Kushnirov V.V., Dagkesamanskaya A.R., Poznyakovski A.!., Paushkin S.V., Nierras 
C.R . Cox B S , Ter-Avanesyan M D . Tuite M F EM BO J 14-4365-4373(1995) 
ss [ 43 Grentzmann G , Brechernier-Baey D., Heurgue-Hamard V., Buckingham R H. J Biol. Chem 270: 10595-10600 

(1995), 

[ 53 Nelson R.J., ZiegeihofferT.. Nicolet C, Werner-Washburne M., Craig E.A. Cell .71:97-105(1992) . 

[ 63 Ann D K , Moutsatsos I K , Nakamura T , Lin hi H . Mao P-L . Lee M.-J . Chin's"' Uern R FR 7 Wang E J. Biol 
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Chem 266 10429-10437(1991} 

[7] ForchammerK., Leinfeldr W.. Bock A. Nature 342:453-456(19891 
[ 8] Manavathu E.K.. Hiratsuka K.. Taylor D.E. Gene 62:17-26(19881 

[ 9] Leblan; D J , Lee L N Titmas B M Smith C J Tenovei FC J Bacteno! 1"0 ^018-3626(1 988; 
s [10] Cervantes E Snarma S B Maillet F Vassc J Truchet G Rosenberg C Mo! Microbiol 3 'MS 755t i989i 

[11] PlunMt G HI Burl.jnd VD , D.meh. D I. Battnei F- R Nucleic Aonis Res 2 1 3591-3398, !9P3) 
[12] Moilf-t W Schipper A Arnon^ R Biorhiinif- 63 &83-0£S(1«8"") 



[0681] .\?8 G1"P cy<. lohydi olase li 
to GTP cyek hydrolase il catalyses the fusl committed ste^ in the tio^ynthes-is of ntx flavin. 

[0682] [1] Rtehtf-t G Rtiz H katf-nmeif-t G Voik R kohnlu A Lottsp-'ich F <\!!endorf D Barh.v A J B ict-=Tioi 19«2 
175:4045-4051. 

[0683] 22P Galactose -1 -phosphate und\ I tr .tm fetase statutes tGalPJJDPJransf) 

Galactose- 1 -phosphate uridyl trunsfetase (EC 2 7 7 lGHsMTjcatah:>s the transfer <'f. jn uinvloipnosphut^ group on 
»5 galactose (or glucose) 1 -phosphate. During the reaction, the uridyl moiety links to a htstfdme residue, in the Escherichia 
colt enzyme, it has been shown [1] that two histidine residues separated by a single proline residue are essential for 
enz\ me actK it\ On the h.jsis ot sequence sunil .nitiei two appjtently untested families seem to e>itt Cla* s I en.T vines, 
are found in eukaryotes as u\ell as some bacteria such as Escherichia colt or Stieptom\ces livtdans while ciass-il 
enzymes have been found so far only in bacteria such as Bacillus subtilis or Lactobacillus helveticus [2] Signature 
so patterns fot txth families win* devekped Fa elass-1 enzymes the i,ign : ituie is. based on the jclwt site t^sidues Fot 
class-ll engines a tegion which also inclines two consei^'eT histidines was chosen. 
Consensus pattern F-E-N-[RKj-G-v(3j-G-v(4j-H-P-H-^-Q [The two H's are the active site residues]- 
[0684] Consensus j. attftn C-L-P-l-V-(.-.-G-[ST3-[LIVfy1](2i-[^A]-H-[DEN3-H-[Fy]-'j-G-G- Note ?\a>*-\ te nz>mf<sare 
structurally related to tht HIT family of piotems (see- ^PDOC0C6«4 



[1] Reichardt J.K.V., Berg P. Nucleic Acids Res, 18:801 7-9028(1988). 
[2]Mol!etB Pilloud N I Baotenol I 7 ? 4464-447-M 1V911 

[0685] 230 Gamma-thiontns famiK stgnatute 
30 [0686] 1 he following '.mall plant \. loleins aie evolutionary tt iated 



Gamma-thionms from wheat endosperm i ga mma-pu roth ion msi and barley (gamma- hordothionirts) which are toxic 
to animal cells and inhibit protein synthesis in cell tree systems [1 j. 
A flower-specific thionsn (FST) from tobacco [2], 

Antifungal proteins ( AFPt from the- se->Kis of Bfassieaee-at; sp^cte-s such at radish mustard, turnip and Arabidopsis 
thaiiana [3]. 

inhibits of trisect alpha-ainvlases from soicihum [4j 
Pioh3ble prot<-ase inhiottoi F322 frcm potato 
A germination-related ptotein from cov.pea [5] 

Anthei-s|.ecitic protfin SF 18 from sunflo^et [6] SF IS is a j.iotein thaKontatnb a ^amina-thionin domain at its N- 

termmus and a prolme-nch C- terminal domain. 

Soybean sulfur-rich protein SE60 [7], 

Victa faba antibaotenai peptides fabatm-1 and 2 

[0687] In ih^it mature tonn these proteins g^nu rally consist of about 45 to SOamino-acid residues. As shown in the 
following schematic repiesentation these peptides contain eight conserved cjsteines involved in disulfide bonds 



-HI li 



xxCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC ****** ************* [***|| 



'C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 



[0688] Consensus pattern [KRG]-x-C-xi 3)-{S V]-x(2HFYWH]-x-(GF]-<-C-x( 5)-C-xj 3}-C [The fourC's are involved in 
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disulfide bonds)- 

[1j Bruix M., Jimenez M A , Santoro J , Gonnalez C. Colilla FJ , Mendez E., Rico M. Biochemistry 3271 5-724 
(1993). 

s [2] Gu Q., Kawata E.E., Morse M.-J., Wu H.-IVL, Cheung A Y. Mot Gen. Genet. 234:89-96(1992). 

[3] Terras F.R.G.. Tonekens S.. van l.euven F. Osborn R.W. Vanderleyden J . Cammtie B PA., Broekaert W.f : . 
FEBS Lett. 316:233-240(1 993). 

[4] Bloen C. Jr.. Richardson M. FEBS Lett. 279:101-104(1991). 
[5) Ishibashi N., Yamauchi D , Miniamikawa T Plant Moi. Bio! 15:59-64(1990), 
10 [73 Choi Y , Choi Y D . Lee J S Plant Physiol 101 699-700 j 1993) 

[0689] 231 Gelsolin. Gelsolin repeat. Number of members.' 170 

[0690] [IjMedline 9743307/. The crystal structure of plasms gelsolin: implications for actin severing, capping, and 
nucleation. BurtnickLD, Koepf EK. Grimes J. Jones EY, Stuart Dl McLaughlin PJ, Robinson RC, Cell 1997.90:661-670. 

is [0691] 232. Germin family signature 

Germins [1] are a family of homopentameric cereal glycoproteins expressed during germination which may play a role 
in altering the properties of cell walls dunng germinative growth. It has been shown that wheat and barleygermins act 
as oxalate oxidases (EC 1.2.3.4). an enzyme that catalyzes the oxidative degradation of oxalate to carbonate and 
hydrogen peroxide. Germins are highly similar to ■■ Germin-like proteins from various plants such as rape, violet or 

so white mustard - Slims; moid spherulins 1a and lb which are- proteins that accumulate specifically during spherulation, 
a process induced by various forms of environmental stress which leads to encystment and dormancy As a signature 
pattern the best conserved region was selected' a decapeptide located in the central section of these proteins 
[0692] Consensus, pattern: G-x(4}-H-*-H-P-x-A-y-E-[L!VM]- 
[0693] [1] Lane B G FASE8 J S 294-301(1994). 

25 f0694| 233. (GlutR) 

Glutamyl-tRNA reductase signature 

[0695] Detta-arrnnoievulinic acid (ALA) is the obligatory precursor for the synthesis of all tetrapyrioles including por- 
phyrin derivatives such as chlorophyll and heme ALA can be synthesized via two different pathways' the Shemm (or 
C4) pathway which involves the single step condensation of succinyl-CoA and glycine and which is catalyzed by ALA 

30 synthase (EC 2 3 1.37) and via the CSpathway from the five-carbon skeleton oi glutamate The C5 pathway operates 
in the chloroplast ot plants and algae, in cyanobacteria. in some eubactena and in archaebactena 
[0696| The initial step in the C5 pathway is carried out by giutamyl-tRNA reductase (GluTR) {1] which catalyzes the 
NADP-de pendent conversion of gfutamate- tRNA(Glu) to glutamate-1-semiaidehyde (GSA? with the concomitant re- 
lease oftRNA(Glu) which can then be recharged with glutamate by glutamyl-tRNA synthetase. 

35 [0697] GluTR is a protein of about 50 Kd (467 io 550 residues) which contains a few conserved region. The best 
conserved region is located in positions 89 to 122 in the sequence of known GluTR This region seems important for 
the activity of the enzyme We have developed a signature pattern from that conserved region. 
[06983 Consensus pattemH^LIVM]-x{2HL!VMHGSTAC3(3HL!VMHDEQ^S4LIVMA3^LIVMK2HGF3~E-x-[EQRf 
[IV3-[UTHSTAG3~Q-[LSVMHKR] Sequences known to belong to this class detected by the pattern ALL. 

40 [06 993 [1] Jahn D.. Verkamp E , Soell D. Trends Btoehem. Sci. 17 2 15-21 8(1 992). 
[0700] 234 (Glycoprotease) 
Glycoprotease family signature (aka Peptidase w M22) 

[0701] Glycoprotease (GOP) {EC 3 4 24.57) [1 3, oro-syaioglycoprotem endopeptidase, is a metalloprotease secreted 
by Pasteurella haemolytica which specifically cleaves G-sialoglycoproteins such as glycophorin A The sequence of 
45 QCP is highly similar to the following uncharactenzed proteins: 

Escherichia coll hypothetical protein ygjD (QRF-X). 

Bacillus subtifis hypothetical protein ydiE. 

Mycobacterium leprae hypothetical protein U229E. 
so - Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

Synechocystis strain PCC 8803 hypothetical protein slr0807. 

(Vlethanococcus jannaschii hypothetical protein MJ1130. 

Haloarcula marismortui hypothetical protein in HSH 3' region. 

Yeast hypothetical protein YKR038c. 
ss - Yeast hypothetical protein QRI7. 

[0702] One of the conserved regions contains two conserved histtdmes It is possible that this region is involved in 
coordinating a metal ion such as zinc. 
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[0703] Consent pattern[KR]-[GSAT]-\t41-[F , i V\-LH3-[DQNG!\j-A-PA-[LI\, MF ^J-ai 3J-H-M21-[AG3-H-JLiy M} Se- 
qu<=nc6b kfuwn to belong to this. dass detected by the pattern ML 

[0704] Noit; the<.e ptotems belong to family M22 in the classify alion of pi-'pt«J?3St:s [2 E I] 

s [ 1j Abdullah K.M.. Lo R.Y.C., Mellors A. J. Bacterial. 173:5597-5603(19911 

f >] Rawlmgs N D Bluett A I Meth E-n-ymol ^4ci 183 ?.l 8t W95s 

[0705] 235. (Glucosamine jsos 

GI'tco::,amine'gal,aotosamine-6-phospriate isomerases signature 

10 GlULOsamtrn^-phospriate isomerase f EC 5 3 1 1Unot C-<k'-6-P deaminase) is, th-i en::vme responsitlf 1 for the conver- 
sion of glucosamine (V-phosphale into fructose?, phosphate [1] it is, the last specific step tn the pathway foi N-;.jrt;ty [glu- 
cosamine (GlcNAC) utilization tn bacteria such as Escherichia colt {gene nagB) or in fungi such as Candida albicans 
igene NAGU Glc-v P it omeiase is evolutionary tested to ■ A fiitatKe £ iOterii.hu coli galactosamtne-6-phosphate 
i^ometase (gene agaf t [C] -Eschenchia coli hypothetical protein yi^K - Bacillus sufcttli^ hypothetical protein ybfT 

»5 a signature pattern a conserved region located in the central part of these enzymes was selected. This region contains 
a conserved hrstidme which has been shown [1 j. in nagB. to be important for the pyranose ring-opening step of the 
catalytic mechanism 

[0706] Consensus pattern ELIVM3a(3]-G-^[LIT]- a -ELIV3-v-{LIVM]-\-G-[LIVM]-G-v- [DENj-G-H- 

20 [13 Ohva G Pontes M R M Ganatt R u Altamitano M M Cakagno M L Hastes E Structute 3 k<23-k->12 

(19951 

E^jPetzetJ Ramseiei TM Recer A CnarbitA Satei W H Jr Mic!ODiolog>> 142 231~25Ui 1996^ 

[0707] 23B Pneumonitis ilttit tuntsnt glyi-opiotmn G (glycoprotein Gt 

2S [07 083 This family includes attachment proteins from respiratory synctial virus Oncoprotein G has not been sho^>n 
to have any neuraminidase or hemagglutinin activity (Swiss-Prot). The ammo terminus is thought to be cytoplasmic, 
and the cat boo/I terminus e>dia<.ellulat The exttacellular legion contains four completely -xnserved cysleine lesidues 
[0709] [1] Johnson PR 3pnggs MK Olmsted RA Collins PL Pr<x N ill Acad Sci U 3 A 1«£T m 5o25- c >623 
[0710] 237 Glycos^l transferases gioup 1 

30 [0711] Mutations in this domain oi bwiss PVz8/ lead todiseast; iParo<> c .mal Nocturnal haemoglobmuna} Membets 
of this family ttansferoctr ated sugars to a vanety of sufcstrates, including glycogen Fiuctose-6-phosphate am lip^ pol- 
ysaccharides. Members of this family transfer UDR ADR GDP or CMP linked sugars. The eukaryotic glycogen syn- 
thases may be distant member, of this, tainily 
[0712] 138 G!yi.osvl trjnifeiases (Glvcoi__t!anst_2) 

35 [0713] DK-f-ts." family transit ring i,ugat from UDP-g!u(.ose UDP-N- iceivl-gaiactosamme GDP-mannost; ot CDP- 
abequose. to a range of substrates including cellulose, dohchol phosphate and teichoic acids. 
[0714] 239. (Glucos_transf_3s 

Thymidine and pyrtmtdine-nueleoside phosphorylases signature 

[0715] Thymidine phospnorylase (EC 24 24) catalyzes the re\ersible pnosphorolysis of thymidine deo^yundine 
40 and their wak pjties to their re^peaive La^eb =ind 2-deo<vrtbose 1 -phosphate This enzyme lenulates the a\ailability 
of thymidine and is therefore essential to nucleic acid metabolism. 

[0716] In Escherichia coli (gene deoA), the ercyme is a dimer of identical subunrts of aoout 43 Kd [1] In humans it 
*a<i first identified a*, platelet-denved endothelial cell giowth factor \PU EC GF 1 1 E-1 1 j befoie being lecogmzed j2] as 
thymidine phosphorylase. 

45 [0717] Bacterial pyrimidine-nucleosidt; phosphoryias- 1 (EC 2 4 2 2t (g> j ne pdpl [3] is, an en^vme evolutionary and 
structurally related to thymidine phosphorylase. 

[0718] A a well conserved legion of 19 lesidues located in the N-teiminai part of these proteins signature pattern for 
these enzymes was selected. 

[0719] Consensus patter n$-EG3 J-R-fGAJ-ELi \ }~\{Z i-[TA3-[GA]-G-T-v-D-v-ELI\'3-£; Sequences known to belong to this 
so class detected by the pattern ALL. 

E 13 Walter MR Cool-VVJ Cok- L B Short 3 A hoszolkoGW htemt^vTA Ealich S £ J Biol Chem 26b 
14016-14022(1990). 

[ 23 FurukawaT Voshimina A bumi^awaT HaiagiKhiM AktyamaS-l FuKuiK Vamad^ \ Nature 356 668-668 
SS (1992 V 

E Jj Sa*ild H H Andersen L IN Hammet K J Bactenol 178 424-434(1^96) 
[0720] 240. Giycos_transf_4. Giycosyl transferase. Number of members: 44 
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[0721] [1] Medline. 95252686 A family of UDP-GlcNAcyMurNAc pofytsoprenol-P G!cNAc/MurNAc~1-P transferases 

Lehrman MA; Glycobiology 1994:4:768-771. 

[0722] 241 Giycosyl hydrolases family 15 21 members 

[0723] 242. Giycosyl hydrolases family 16 signature 

s it has been shown [1] that the following giycosyl hydrolases can be classified into a single family on the basis of 
s.equence similarities- ■■ Bacterial beta-1,3-1,4-giucanases, or lichenas.es.. (EC 3 2 1 73) mainly from Bacillus but also 
from Clostridium thermoceiium (gene lic-B), Fibrobacter succinogenes and Rhodothermus marinus (gene bgIA) - Ba- 
cillus circulans beta-1,3-giucanase A1 (EC 3._2._1_.39) (gene glcA). - Lamarinase (EC 3.2.1.8} from Clostridium thermo- 
cellum (gene lamH ■■ Streptomyces coelicolor agarase (EC 3 2 1 81} (gene dagA). ■■ Alteromonas carrageenovora 

to kappa-carrageenase (EC 3.2.1 83) (gene cgkA) Two closely clustered conserved glutamates have been shown [2] to 
be involved in the catalytic activity of Bacillus licheniformislichenase. The region was used that contains these residues 
as a signature pattern. 

[0724] Consensus pattern: E-:-[L.IVJ-O-ll.lV3-<{0.1)-e-><{2HGOHKRNF3-x-{P^TA] [The two E's are active site resi- 
dues ]- 

?5 

[ 1) Henrissat B. Biochem. J. 280:309-316(1991). 

[2] Juncosa M, Pons J., Dot!., Querol E., Planas A. J. Biol Chem. 269:14530-14535(1994). 

[0725] 243 Giycosyl hydrolases family 17 signature 

so It has been shown [1.2] that the following giycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities - Glucan endo-1,3-beta-glucosid3ses (EC 3.2. 1.39) (endo-f 1~-3)-beta-gluc.an3se} from various 
plants. This enzyme may be involved in the defense of plants against pathogens through its ability to degrade fungal 
cell wall polysaccharides. - Glucan 1 ,3-beta-g!ucostdase (EC 3,2.1 .58) (exo-(1->3Vbeta-glucanase! from yeast (gene 
BGL2) This enzyme may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spoie 

25 release during spoliation. - Lichenases (EC 3.2.173) (endo-(1->3,1->4}-beta-g!ucanase,) from various plants. The 
best conserved region in the sequence of these enzymes is located in their central section. This region contains a 
conserved tryptophan residue which could be involved in the interaction with the glucan substrates [2] and it also 
contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism this 
region was used as a signature pattern. 

30 Consensus pattern: [LIV'M]-:<-[LiyMFYWAj(3}-[STAG3-E-[SrA^G-W-P-[STN3-:<-[SAGQ3 [E is an active site residuej- 

[ 1j Henrissat B. Biochem. J. 280:309-316(1991). 

[23 Ori N., Sessa G, Lotan J., Himmeihoch S., Ruhr R. EMBO J. 9:3429-3436(1990}. 

[ 3j Varghese J.N., Garrett TP J.. Colman PM : Chen 1... Hoj P. J., Fincher G.B Proc. Natl. Acad Sc.i. U.S A. 91: 

35 2785-2789(1994). 

[0726] 244. Glyoxalase ] signatures 

Glyoxalase 1 (EC 4. 4, 15) {lactoylglutathione lyase) catalyzes the first step of the glyoxa! pathway, the transformation 
of methylgiyoxai and giutathionemto S-lactoyiglutathione which is then converted by glyoxalase II to lactic acid [1], 

40 Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc per subuntt The bacterial and yeast enzymes are 
monomelic while the mammalian one is homodimenc. 'The sequence of glyoxalase 1 is well conserved, in bacteria and 
mammals, the enzyme is a protein of about 130 to 180 residues while in fungi it is about twice longer in these organisms 
the enzyme is built out of the tandem repeat of an homologous domain. Two signature patterns for this family were 
derived. The first one is located in the N-terminal region while the second one is located in the central section of the 

■*s protein and contains a conserved histidme that could be implicated in the binding of the zinc atom. 

[0727] Consensus pattern' [HQHIVT]-y4LIVFy]-x-[IV]-x(5}4STA]~x(2)-F-[YM]-x(2.3)-[LMF]-G-[LMF]- 
Consensus pattern: G-[NTKQ]-x(0,5HGA3-(LVFY3-[GH]-H-[IVFHCGA]~x-[STAGLE]-x(2}-[DNC3- 
[0728] 1 1] Kim N -S . Umezawa Ohmura S„ Kato S J Biol Chem. 268 11217-11221(1993) 
EQ729]| 245. (Giypican) 

so Glypicans signature 

Gly pic-arts. [1.2] are a family of heparan sulfate proteoglycans which are anchored to cell membranes by a giycosyl- 
phosphatidylinositol (GPI) linkage Structurally, these proteins consist of three separate domains 

a) A signal sequence: 

ss b) An extracellular domain of about 500 residues that contains 12 conserved cysteines probably involved in disulfide 

bonds and which also contains the sites of attachment of the heparan sulfate glycosammoglycan side chains; 
c) A C-terminal hydrophobic region which is post-translationaliy removed after formation of the GPI-anchor. 
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[0730] Tne proteins known to belong to this family are 

- Glypscan 1 {GPC1). 

Gtvpfcan 2 (.GPC2) or cerebroglvcan. 
s . GhjOtcan 3 tGPC3> or OCl 5 In man defects in GPC3 are the cause of a X iinkea genetic disease Simoson- 
Galabi-Behnrtel syndrome (SGBSk. 
K-glypiean. 
Giypican 5 (GPC5), 
Drosophtia protein dally. 

10 

[0731] The ^tgnatute pattern that was cWtHop^d for glypicans is located tn the (.antral section of the intracellular 
domain and contains five t f the conserved cysteines 

[0732] Consensus pattemC-^i C <-G-[l. IVMJ-x t 4V-P C ^VSH'K H-^P- IVM}->(2i G-C [The C's are piobdbly in- 
^ oh ed tn a disulfide fc onds) Sequen:es l-nown to belong to tnts cLiss iete:ted by the pattern ALL e<rept for dalh 

?5 

[ 1) Weksberg R.. Squire J.A.. Tempteton D.M. Nat. Genet. 12:225-227(1996). 
[ I ] U'dtan.jbe N Yamad.jH Yamaguchi ^ J Cell fSioi 1 30 11 0" 1218(19^oi 

[0733] 246 Gran ins signatures 

so tjtanmi, Khromogianins or sticr^togMntn&l [I] art a fani!l> of acidic protein*, preservi tn the sectektiy grannies of a 
wide variety of endoenne and neuro-endocrtne cells. The exact functions) of these proteins is not yet known but they 
seem to be the piecursots of biologically actue peptides and/or the) may act as helper pioteins in the packaging of 
peptide hormones and n^uio^eptides Thte*> member cf this family of piotems 4, how some sequence similarities - 
Chioinogranin A tCGAt [2] CGA is a protein of about 420 residues it is th> j pterin s>or of the peptide pancteasiatin 

25 which strongly inhibits glucose-induced insulin release from the pancreas. - Secretogramn 1 (chromogranin Bs. A sul- 
fated piotem of about 600 residues ■ Senetogranin 2 (Hirorriogianin Gi A sulfated piotetn of about 6c0 residues 
Apart from their subcellular location and the abundance cf acidn. lesiduehtA^p and Gltn, these proteins do nc! shaie 
many sttuctura! siinilauties OnK on« short recuon located in the C -terminal taction is conserved in illthtst; proteins 
Chromogranms A and B shate a region ot high similarity in then N-teiminai section this region includes two cysteine 

30 residues involved in a disulfide bond 

[0734] Consensus pattern [DEHSN]-L-[S-iN1-x(2HDE]-v€-L- 

Consensus pattern: C-[LIVMK2")-E-[LIVMK2i-S-EDNHSTA]-L-x-K-x-S-x(3V [LIVMHSTAj-x-E-C [The too C's are linked 
by a disulfide bondj- 

35 [ 1 j Huttner W B., Gerdes H -H . Rosa P Trends Biochern Set 16 27-30. 1991 ) 

[2j Simon J.-P., Aunis D. Biochem. J. 262:1-13(1989). 

[0735] 247. grpE protein signature 

In prokaryotes the grpE protein [1] stimulates, jointly with dnaJ the ATPase activity of the dnaK chaperone. it seems 
40 to accelerate the release of ADP from dnaK thus allowing dnaK to recycle mote efficiently GrpE is a protein of about 
22 to 25 Kd In yeast, an evolutionary related mitochondrial piotein(gene GRPE) has been shown [2] to associate with 
the mitochondrial hsp70ptotein and to thus piay a role tn the import of ptoteins from the cytoplasm As a signature 
pattern, the most conserved region of grpE was selected. It is located in the C-terminal section. 
[0736] Consensus pattern. V : U-SONHPHEAi->.2)-jHMj-:<-A-jLlVMT N]-x. 16,20)G-jf : Y[- x(3)-jDEG!-Ki2HUVM]-- 
4S [RIJ-x-tSAJ-x-V-x-tiV]- 

[ 1] Georgopouios C., Welch W, Annu. Rev. Cell Bioi. 9:601-635(1993), 

[ 2] Bolhger L Deloche O . Glick B S Georgopouios C , Jenoe P, Kionidou M . Horst M Morishima N , Sen at: 
G, EMBO J. 13:1998-2008(1994). 

[0737] 248. Guanylate kinase signatuie and profile 

Guanylate kinase i EC 2 7 4 85 (GR [1] catalyzes the ATP-dependent phosphorylation of GMP into GDP. It is essential 
for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia coif), lower eukaryotes (such as yeast) 
and tn vertebrates, GK ts a highly conserved monomelic protein of about 200 amino acids GK has been shown [2,3,4] 
55 to be structurally similar to the following proteins: - Protein A57R i or Sa!G2RT from various strains of Vaccinia virus 
This protein is highly similar to GK, but contains a frameshift mutation in the N-termtnal section and could therefore be 
inactive in that virus The following proteins are characterized by the presence tn then sequence of one or more copies 
of the OMR domain, a SH3 domain (see <PDOC50002> as weli as a C-terminal Gk-like domain these protein are 
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collectively tei med MAGUf' s (membrane-associated guanylate kinase nomoiogs i [5] - Drosophtla letnaH 1 >discb large- 
1 tumoi supi.iesbor pt<. ttinig^ne dip n This, protein tb absoctated with septate junctions m dtvteloptn^ flies and detects 
tn th-i dig! $<*n<a cause neuf Ijstk' overgicwfh of the inuqmal disks - Mammalian tio,hl junction f.iol«.-in Zo-1 - A family 
of mammalian synaptic proteins that seem to interact with the cytoplasmic tail of NMDA receptor subunits. This family 

s currently consist of SAP90/PSD-95 CHAPSYN-110/PSD-93. SAP97/DLG1 and SAP 102 - Vertebrate 55 Kd erythro- 
cyte memhMne protein i pb'b ii .3 p.jimitoyLttfrd memt tatie-.3i sc outed ptotein of unkncwn kinetic n ■ Caenorhab- 
diiis <■ !> j gans protein im-2 which may play a sttu^turai rolt in the induction of the vnka - Rdt prote in CASK - Human 
protein DLG2. - Human protein DLG3.There is an ATP-bmding site (P~ioop) in the N-termmal section of GK. This region 
is not conserved in the GK-itke domain of the above proteins which are therefore unlikely to be kinases. However these 

10 f lotems ic t : iin the ttsidue 1 . known tn GK to bt> involved m the binding of t^MP As a si jnaUitc pattern a highly connived 
legion wjs s^le oted that cent iins two atgimne md a tyro^mu which are involve in GMP-binding 
[0738] Consensus pattern T-[STj-R-x(2)-[KR]-^2)-[DE3-<t2^G-v(2}-y->-[F\ ]-[LIVMKj- 

[ 1] Stehle T Schufc G E J Mol BiH 224 1127-1141(1092} 
*5 [ 2j Bryant P.J. , Woods D.F. Cell 68:621-622(1992"). 

[ 3j Goebl M G fiends Biochem Set 1 1 99- Cl 9 l 1< 92} 

| 4} .?schoc*e PD Srtiiftr £ ^buLGI- t-ui J Bio, hem 213 263-269, 19P3) 
[ 5] Woods D F Bryant P J Mech Dev 44 9S-8U« 1994^ 

20 [0739] 249 (Glyco_hydro_35> 

Givcosyl hydrolases family 35 putative active site 

[0740] Beta-gaiactosidases tEC 3 2 1 2cr1 fiom mammal*, fungi plants ana the bacteria Xanthomonas manihotis ate 
evolutionary related [1.2]. They belong to family 35 tn the classification of givcosyl hydrolases [3.E1]. 
[0741] Mammalian huta-g ilaoiosidase is i lysosomal tnzyint; (gent* GLB11 which de?<vt;s th> j terminal galactose 
25 from gangliosides glycoproteins ano glycosammeglycans and whose deficiency is the cause of the genetic disease 
Gmi1 ) gangliosidosis (Morquio disease type Si. 

[0742] On of the- besl -x nservsr-d legions in ihese en^yme^ <.onlams a glutamic acid ie-sid'te which on ihe basis of 
similarities with uth> j f families of givcosyl hydro! ises [4] ptobablv <i<. Is is thu ptoton oonor in thi* catalytic mechanism 
Tnis region wss used as a signature pattern. 
30 [0743] Con^n<.u<. ( attorn <j-G-P-[l_ly M].2)-><2i-> j-v-b-N-b-[h /] [ I fit seeono b the l utatK'C : iofiw stt-s ttsiduej 
Sequences hnown to fcelong to this class aetected bvthe pattern ALL 

[ 1j Taron C.H.. Benner J.S.. Hornstra L.J., Guthrie E.P. Glycobiology 5:603-610(1995). 

[ Caiey A T, Holt K Picard S Wilde P Tncl^e G A , & id C R 3chuehV\ SevmourG B Plant Phy.iol 1C3 
35 1099-1107(19951 

[ jj Ht- unseat B Baiio<.hA Biothein J 29 3 7e*1-; 88(1^93) 

[ 4] H«.-rmss=it B >"allebauf I Fabr^g=i S bihn P MuincnJ-P Davi<^ G Proc Nat! Acad vi USA Q2 
7090-7094(1995). 

40 [0744] 250. (Glyco_hvdro_16) 

Givcosyl hydrolases family 16 signature 

[07453 !t has been shown [1 j that the following glycosy! hydrolases can be classified into a single family on the basis 
of sequence similarities: 

45 - Bactenal heta-1 2-1 4-qlncanase s ui lichen^e 1 , ( Er 3 2 1 ~3) mainly from bacillus but iKo from Clo^tndiuni 
thermocellum (gene licB). Fibrobacter succmogenes and Rhodothermus mannus (gene bglAl. 

- Bacillus circulars beta 1 3-ghicanase A1 tJX 3 2 1 , J .9) (gene gio<\) 

- Lanunnast; (EC "3 2 1 8} ffcm Clctudium the nnoctilluni (gent- Liinl | 
Streptom^ces coehcolor agarase iEC 3 2 i SU tgene dagAj 

so - Alteiomonas cairagfe^novoia ka|. pa-cairag<=<;nasH (E*_ 3 2 t 83| tg^nn v.gkA) 

[0748] Two rlosely tlusterHi ronserved glutjmates hove been sh<«vn [2] to be in^ •jived m the catalytic activity of 

Bacillus licheniformis lichenase. The region that contains these residues as a signature pattern was used. 

[0747] Consensus pattern t-EI.lv jUVj-Aiu l)-B v,'?i [GQ|-[KPNf-j-^[PSlA3|Thetwo^saieaoth'esite lesidues] 

ss 

[ ijHennssatB Biochem .( 28U 309-3 IG t 1Ct 1) 

[2]JiincOfeaM Pon^J Dot T Queiol F , Plants 6 J Bio! Ch^rrt 2r>9 l45;0-l-!?35. 19-4-3 > 
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[0748] 251. (Giyco_hydro_17) 
Glycosy! hydrolases family 17 signature 
{aka giycosyi_hydro4) 

[0749] It has been shown [1.2] that the following glycosyl hydrolases can be classified into a single family on the 
s basis of sequence similarities: 

Giucan endo-1 ,3-bela-giucosidases (EC 3 2 1 39) (endo-(1->3)-bela-g!ucanase) from various plants This enzyme 
may be involved in the defense of plants against pathogens through its ability to degrade fungal cell wall polysac- 
charides. 

to - Giucan 1,3-beta-glueosidase (EC 3 2 1 58) (e:<o-(1->3}-beta-glucanase) from yeast (gene BGL2). This, enzyme 
may play a roie in ceil expansion during growth, in cell-cell fusion during mating, and in spore release: during 
sporulation. 

Lichenases (EC 3.2.1.73) (endo-(1-Xvl">4)-beta-giuc.anase) from various, plants. 

*5 [0750] The best consented region in the sequence of these enzymes is located in their central section. This region 
contains a conserved tryptophan residue which could be involved in the interaction with the giucan substrates [2] and 
it also contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism. 
This region was used as a signature pattern. 

[0751] Consensus pattern [!JVP/)--y--[LJVMFYWA]t3!-{STAG]-E-jSTA3-G-W-P-[STN]--y--[SAGQ]fE: is an active site res- 
so tdue] Sequences Known to belong to this class detected by the pattern ALL 

[ 1] Henrissat B. Biochem. J. 280:309-318(1991). 

[ 2] Ori N., Sessa G., Lotan T, Himmelhoch S., Fluhr R. EM BO J. 9:3429-3436(1990). 

[3] Varghese J.N.. Garrett T.F.J., Colman P.M., Chen L s Hoj P.J., FincherG.8. Proc. Natl. Acad. Sci. U.S.A. 91: 
25 2785-2789(1994). 

[0752] 252 (Glyoo..hydro..3) 

Glycosyl hydrolases family 3 active site 

[0753] It has been shown [1.2] that the following glycosyl hydrolases can be ; on the basis of sequence similarities, 
30 classified into a single family: 

Beta glucosidases (EC 3.2. 1.21) from the fungi Aspergillus wentii (A-3), Hansenula anomala, Kluyveromyces fra- 
gilis. Saeoharomycopsis ftbuiigera. (BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 
Beta glucosidases from the bacteria Agrobacterium tumefac.iens (Cbg1 }, Butyri vibrio fibrisolvens (bglA), Clcstnd- 
35 turn thermocellum (bg!B} : Escherichia coli (bglX), Erwmia chrysanthemi (bgxA) and Ruminococcus albus. 

Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 
Bacillus subtitis hypothetical protein yzbA. 

Escherichia coli hypothetical protein ycfO and H10959, the corresponding Haemophilus influenzae protein. 

40 [0754] One of the conserved regions in these enzymes is centered on a conserved aspartic acid tesidue which has 
been shown [3], in Aspergillus wentii beta- glucosidase A3, to be implicated in the catalytic mechanism. This region 
was used as a signature pattern. 

[0755] Consensus patt*rn(LJVM]f2HKR>-y--{i;-:OK}-x{4K^|lJVMF-"T]-[t.tVTj-{LtVMF3-- jST[-D-xf2)-[SGADNlj [D is the 
active site residue] Sequences known to belong to this class detected by the patt.ernALL. 

45 

[ 1 j Henrissat B Biochem J 230 309-316(1 991 ) 

[2j Castle LA., Smith K.D . Morris P O J Bacteriol 1?4.1478-1486(1&92). 
[ 3 j Bause E., Legler G Bioehim Biophys Acta 626:459-465(1980). 

so [0756] 253. (Glyco_hydro_28) 

Polygalacturonase active site (aka PG) 

[0757] Polygalacturonase (EC 3 2 1 15) (PG) (pectinase) [1,2] catalyzes the random hydrolysis of 1,4-alpha-D-ga- 
lactosiduronic linkages in pectate and other galacturonans. In fruit, polygalacturonase plays an important role in cell 
wall metabolism during ripening. In plant bacterial pathogens such as Erwmia carotovora or Pseudomonas 
ss solanacearurn and fungal pathogens such as Aspergillus niger, polygalacturonase is involved in maceration and soft- 
rotting of plant tissue. 

[0758] Exo-poly-alpha-D-galacturonosidase (EG 3 2 1 82) ievoPG) [3] hydrolyzes peptic acid from the non-reducing 
end. releasing digalacturonate. 
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[0759] Prokaryotic eukaryotic PG and e^oPG shaie a few legions of sequence similarity The best consewea of 
thebe tenions v^b selected It is centered on a conserved histidme nxst prob=ibK invd^ed in the catalytic mechanism 

[0760] Consensus pattein[GSDENkRH]-<{2V[\ MFC]->i2>-EGSj-H-G-iLfVMAG1-v( 1 2j-{U\ Mj-G-S [H is the putative 
s active site- residue] occtuences known to belong to this class detected by the patte-rnALL 

[0761] Note these prtteins telono, to family <?C in the cl^i situation of oly^osv! hydtol.jses [It] 

[ 1j Ruttowski £., labitzke R., Kbanb N.Q.. Loeftler F. ; Gottschalk M . Jany K.-D. Biochim. Biophys. Acta 1087; 
104-106(1990). 

10 [2] Huang J Scht-ltMA J Back'No! 1 72 1879-383 v f I090'i 

[3)!-^iY Collirift-i J Baaenui 172 49f*£-4&<^[ 1 390> 
[ 4j Busstnk H.J.D.. Buxton P.P.. Visser J. Curr. Genet. 19:467-474(1991). 
[■jjHenmsjtB Sicxht-m J 280 3^-31G(1<^1 ) 

*5 £07623 254. (Glyco„hydro„32) 

Glycosyl hydrolases family 32 active site 

[0763] It h^i been shown [ t 2) that the following glycosyl hydiol.jses c.jn be oLtssified into .t i ingle family on the 
basis of sequence similarities: 

so - Inulma^e (EC 3 2 I v t (oi tni'las^') fioin tht- fungi Kluyveromyoes mat oanus 

Beta-tiuibfuranosidaseiEC 3 2 1 26) commonly ImTivn as ins'ertasein fungi and plants and assuctase in barteno 
tgene sacA or scrB). 

- R^frindSf iiTvHitase <EC 3 2 I 20} ^ene RfD) fioin Eb_ii<;tKhi=i t oh plasmid pPSD2 

- mas,? (EC 3 2 1 f>5t tgen<? saoCt from Ewiius suhtiits 

[0764] One of the conserved regions inthese eivymes is located in the N-teiminai section and contains an aspartic 
acid residue which has, been s,ho»vn [?] in \east mvertaseto be important for the catalytic mechanism This, legion was 
used as a signature pattern. 

[0765] Consensus pattern H-M2>-P-v(4}-[L!VM3-N-D-P-!\l-G[Disthe acti\esite. residue] Sequences known to belong 
30 to this class; detected hv the patternALL 

[ 1j Henrissat 8. Biocbem. J. 280:309-316(19911 

[ 23 Gunasekaran P.. Karunakaran T.. Cami B. , Mukundan A.G. . Preziosi L. Baratti j. J. Bactenol. 172:6727-6735 
(1990). 

35 [ 31 Reddv V.A.. Maley F. J. Biol. Chem. 265:10817-10120(19901 

[0766] 255. (Gfyco_hydro_1) 
Glycosvl hydrolases family 1 signatures 

[0767] it has oeen snown [1 to4j that the following glycosyl hydrolases can be on the basts of sequence similarities 
40 classified into a single family: 

Beta-glurosidases ^EC 3 2 1 21) fiom various fcactena such as Agrobaitenum strain ATCC 21400 Ba:tllus poly- 
myxa. and Caldoceilum saccharolyticum 
7 wo plants icio^ er) beta o.!tkosid.tses iEC 3 2 12!) 
45 - Two different bttf-gjilactosida^ (EC 3 2 1 2!?t from the irchat;bact> j n;i Suifoiobus solfaraneus (gsm> j s bg;i^ and 

lacSi. 

C-phOi,pho-beta-gaiauoi,idasestt.C 2 1 ifiomvauous ba^tena such as Lactobacillus; casei t actococcui, lac- 
tis. and staphylococcus aureus. 

6~pbospno-beta-glucosidases t£C 3 1 1 86 1 from Escherichia coli (genes bgIB and ascBi and from Erwinia cnr>~ 
50 santhemi (gene arbB). 

Plants m\ a sinuses t£C 3 2 3 1 ) tsinignn^ei (thiop.k<cosid..isei 

Mammalian lactase-phlorizin hydrolase (LPH) i EC 3. z. 1,108 / EC 3.Z.1.621 LPH. an integral membrane glycopro- 
tein, is the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1900 residues which 
contains tour tandem repeats ot a domain of about 450 residues which is evolutionary related to the above glycosyl 
ss hydrolases. 

[0788] One of the cons^ed legions in the^e enrvmes is (.enteied on a (.on->ei^ed glutamic acid i<«-idue ^hkh has 
been shown [5] in the bt-lj-glucosida^ tfom AgtotacUnum to be diiectly involved in g^co^idn. bond d-iavaqe by 
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acting as a nucleophile This region was used as a signature pattern. As a second signature pattern we selected a 
conserved region, found tn the N-termina! extremity of these enzymes, this region also contains a glutamic acid residue 
[0769] Consensus pattern[LiV'MFSTC]-[LlVFYSj-[LiV]-[LlVMSTj-E-N-G-[LiV'MFAR3-[CSAGN] [E is the active sift 
residue] Sequences known to belong to this class detected by the patternALL 
s [0770j fvjote tn ' s pattern wilt pick up the last two domains of LPH. the first two domains, which are removed from the 
LPH precursor by proteolytic processing, have lost the active site giutamate and may therefore be inactive |4) 
[0771] Consensus pattt»rtiF-x-[FYWIv1HGSTA]-y-[GSTA]-x-[GSTAK2)-[FYNH]-[NQ3-y-E-x-[GSTA] Sequences 
known to belong to this class detected by the pattern ALL. 
[0772] Note: this pattern will pick up the last three domains of LPH 

10 

{ 1] Henrissat B Biochern J 280:309-316(1991) 
[2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991 ). 
[ 3] Gonzalez-Candeias !..., Ramon D., Polaina J Gene 95:31 -38( 1990i 
[ 4] El Hassouni M„ Henrissat B . Chippaux M , Barras F. J. Bactenol 174.765-777(1992} 
*5 [5] Withers S.G., Warren RAJ., Street LP.. Rupite K.. Kempton J.B., Aebersold R. J. Am, Chem, Soa 112: 

5887-5889(19901 

[0773] 256 Giyco„hydro„2G 
Glycosyl hydrolase family 20 
20 Previous Pfam IDs: g!ycosyl_hydr11; 
Number of members; 33 
[0774] 257. (Giyco„hydro_9) 
Glycosyl hydrolases family 9 active sites signatures 
faka 6lycosy!_hydM2) 

25 [0775] The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases 
(EC 3.2.1.4). cellobtohydrolases f KG 3.2. 1.91 i (eyoglucanases), oryylanases (EC 3.2.1.3) [1.2]. Fungi and bacteria 
produces a spectrum of celluloiytic enzymes icellulases) and xylanases which, on the basis of sequence similarities, 
can be classified into families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases 
family 9 [4.E1J. The enzymes which are currently known to belong to this family are listed below. 

Butyrivibno fibnsolvens cellodextrinase 1 (cedl). 

Cellulomonas firrti endoglucanases B (cenBi and C (cenC). 
- Clostridium cellulolyticum endoglucanase G (celCCG) 

Clostridium celiulovorans endoglucanase C (engC.i. 
35 - CloslridRtm slercoararium endoglucanase Z (avicelase h (ceiZ) 

Clostridium thermocelium endoglucanases D (ce!D). F (celF) and I (ceil ). 

Fibrobaeter succmogenes endoglucanase A (end A). 

Pseudomonas fluorescens endoglucanase A (celA). 

Streptomyces retlculi endoglucanase 1 (ceil), 
40 - Thermomonospora fusca endoglucanase E-4 (celD). 

Dictyostelium discoideum spore germination specific endoglucanase 270-6 This slime mold enzyme may digest 
the spore cell wall during germination, to release the enclosed amoeba 

Endoglucanases from plants such as Avocado or French bean In plants this enzyme may be involved the fruit 
45 ripening process. 

[07763 l vv0 of the most conserved regions in these enzymes are centered on conserved residues which have been 
shown [5,6], in the endoglucanase D from Cellulomonas thermocelium, to be important for the catalytic activity The 
first region contains an active site histidine and the second region contains two catalytically important residues, an 
so aspartate and a glutamate. Both regions were used as signature patterns 

[07773 Consensus pattern SSTV]-x-[LlVMFr3-ESTV;i->i2K^>-|NKR]-x(4)-jPI.IVM3--H"X--R jH is an active site residue! 
Sequences known to belong to this class detected by the pattern ALL. except for Cellulomonas fimi cenC and Strep- 
tomyces reticuli cell 

[07783 Consensus pattern [FYWj-y-i:)-y(4)--EFYVV|-x(3)-i;v-y-[SrA3-x(3!"hl"(SrA3 [D and [■■ are active site residues! Se- 
55 quences known to belong to this class detected by the pattern ALL. except for Fibrobaeter succinogenes endA whose 
sequence seems to be incorrect, 

! 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990}. 
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[ 2] Oiii.es N R Hennssatb hilbnmDG Miller R C Jr barren RAJ Miciobtoi Re\ 55 303-3 1C( 199 n 
[ 3] Hennssat B.. Claeyssens M.. Tomme P., Lemesfe L. Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Hennssat B Bioehem J 280 309-3 16( 199 1 ) 

E 53 Tomme P Cnauvaiu S Eegum P Millet J Hubert J -F Classens M J Btol Chem 266 1031*1-10318 
s (1991). 

[6] Tomme P ^ an Been men J CUevssens M Biochem I 28^ 319 3J41 1P92) 
[0779] 25b Matrix protein t MA> p15<GA<ijTi3t 

[0780] '!he matm protein pl& is encoded by the gay yene MA is involved in pathoyemuty [1j 
10 [0781] [!] PcisgsyJM Beilharz Mv\' Wir^s, BD H^s AD FHha PM J Virol ! 093 67 5<-j8 9-5999 
[0782] 25« Gag puiyprotmn inn.v coat proton p12 (GAG_P12t 

[0783] The retioviial j. 12 is. a vmon structuial piotein p 12 fo proline rich The function canted out by p!2 in assembly 

and replication is unknown p12C is ass ociated with pathogenicity of the vims 

[1] Pozsgay JM. Beiiharz MW, Wines BD. Hess AD. Pitha PM, J Virol 1993;67:5989-5999. 

h [07843 260 Glutamine synthetase signatures iGLN-SYNT; 

G!sitamine synthetase (EC 6.3.1.21 (GS) [13 plays an essential role in the metabolism ot nitrogen by catalyzing the 
condensation of ojutam.jte and .tmmonu to form ylutrtmine Thvre seem to he thtee diffetent classes of G"? j.? 3 4j 
Class I enz\ tries (GSH are specific to prokaiyotes ano are oligomers of VI identical subuntts The activity of GSi-t\pe 
enzyme is controlled by the adenylation of a tyrosine residue. The adenylated enzyme is inactive. - Class il enzymes 
iG^lh aie found in cukaryolts arid in tacfcna belonging to the Rhi2obtaaije' Frankiace : ie : md StieptctH)Ceface : ie 
families rthese ooctena have ulso a das.s-1 GS) GSil aie ortamw of identical s,uoitnits Plants ha^ e t>vo 01 moie 
isozymes of Gel I one of the isozymes is translocated into tne cnioroplast - Class ill enzymes (GSllh has cuirently 
only been found in Bacteroides fragtlis and in butynvibno tibrtsolvens. It is a hexamer of identical chains. It is much 
larger (;<hout "00 ;<mmo acioVt than the Gbl (450 to 470 amino acids) or Gbll (350 to 420 amino indsi enemas 

25 While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive. As 
signature patterns thiee consewed legtons ^eie selected The first pattern is based on a conserved tetrapepttde in 
the !M-Jwminai section of the engine the- second one is based on a g!y<. trie-rich regie n v\hich is though) to he m"0l"<?d 
in ATP-hinrfing The third pattern is specific to class I glut;<minu s. yntf uMa^s. ;<nd ituJud^s. tht tyrosine rt^idut; which 
is reversibly adenylated. 

JO [0785] Consensus ( attorn [F/WL]-L>-G-S-->-< t & SHDbN- JolAKj4SAHDtj-<{2)-[LIVMFV]- 
Consensus pattern K-P-ELIVMFYA]->^ ^NPATJ-G-iGSTANj-G-v-H-^J-S- 
Consensus pattern: K-ELIVM3-x(SHL!VMA]-D-[RKHDNHU3-Y [Y is the site of adenyiation}- 

f 1] tiisenbetci D Almas sv R J lanson C & Chapman M Snh & V\ Cascio D , Smith WU' Cold Spring 
Harbor bvmp Quant Biol ^ 4fi:-4&D(1«E"i 

E 23 Kumada Y, Benson D R Hillemann D Hosted T.j„ Rochefort D A. Thompson C.J,. Wohlleben W. Tateno 

1' Pioe Natl Acad bn UsA <-J0 3009-10 ! 3. 1991) 

E33 Shatters R.G.. Kahn M.L J. Mol. Evoi. 29:422-428(1989). 

[4jErosvn.iR Masuchi Y RobbFT Doolittie u'F J Mol EjoI 33 566-576(1 Ct 4 1 

[0786] 261 . GJobins profile (globinl s 

Globms are heme-containtng proteins involved in binding and/or transporting oxygen [1] They belong to a ver\ large 
and well studied family which is widely distributed in many organisms. The maior groups of globms are: - Hemoglobins 
(Hb) fiom vertehiates Ht is the ptotein responsitle foi transporting ovgen ftun the lungs k othei tissues It is. a 

■*s te trarnt;! ol tv-,o alpha and two bt-t i ch iins Most v^ritbrate spe oie s also «Ypr> j ss ^psidft^ embiyonir or l> j tal forms of 
hemoglobin where the alpha or the beta chains are replaced by a chain with higher oxygen affinity, as for the gamma, 
delta epstlon and zeta ohatns in mammals, for example Myoglobins (M91 fiom vertebrates Mg is a monomerio 
piofeiri rest.onsible for oxygen ikuqa in muscles - Iru'ertebiate jlotins. [2] A v v ide vanefy ot globins are- found in 
invertebrates Molluscs generally have one or two muscle globms wmcn are either monomenc or dimenc insects such 

■io as the midge Umonomua thummi have a large s<=t of t-vtracollulai ylobins Nematodes and annelids, haw a </anet\ 
of intr.jceihilai and e^tiacelluLti globms, some ef them are multi- domain pe lypeptides iftom two up to nine-domain 
globms) and some produce large, disulfide-bonded aggregates. - Leghemoglobtns (Lg) from the root nodules of iequ- 
minous plants. Lg provides oxygen for bacteroids. - Flavohemoprotems from bacteria (Escherichia col) hmpAi and 
fungi [J3- These proteins consist of two distinct domains: an N-terminal giobin domain and a C~terminai FAD-containing 

ss r^duotass; domain in bacteria such as Vitre osoill 1 thu en^yrntj-asiociat^d giobin is a single domairi prolan All these 
globins beem to haye evolved from a common ancestoi The profile developed to detect members of the gloom family 
!•> ba->ed on a s4mctutal alignment ot seleded giobin sequence* 

[ 1] »"onuse' Encyclope'dia Bioche'imslry Second Edition Walter d<» Gruytti Berlin Hew-YorK 1 1938) { 2] Goodman M 
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Pedwaydon J C^eiusntai. J Suzul.i T Gotoh T Moens L Shishikura F Wat D Vinogradov S J Mo! Evo! 27 
236-249(1988). 

[0787] Plant hemoglobins signature (global 

Leghemoglobins [1 ] are hemoproteins present in the root nodules of leguminousplants. Leqhenrtoglobins are structurally 
s and functionally related to hemoglobin and myoglobin By providing o\\gcn to the bacteroids they are essential for 
symbiotic nitrogen fixation . Structural tv related hemoglobins from the nodules of non-leguminous plants J2 3"J and from 
tht! roots of non-nudulating plants[4j have be-> j n n*cvntly suquenee-d A signaturs; pjttern %^'as developed that picks up 
the sequence of plants hemoglobins, exclusively. 
[0788] C onsensus pattern (SNj-P^ -t. -\t2)-H-A y,;3).f ■ 

10 

{ 1] Powell P Gannon F BioEssays 9 ir-121{10£fet 

[2j KorttAA. Trinick MJ., Appleby CA. Eur. J. Biochem. 175:141-149(1988). 

[ 3] hortt A A Inglis A S Fleming A I Appleby C A f-ESS Lett 231 341-346.1988) 

[43&nguszD AppleoyCA Lanrismann I Dennis E S TnmckMJ Pea ^k W t Natute 531 1 r 8-1 80f 1988) 

J5 

[0789] 262 f-'ructose-bisphosphatt- aldolase class-! active, site tgiycolytk_t-n2| 

[0790] F tuctose-bisphosphate aldolase [ t '/} is a ojvx lytic en.Tvme that i-atalwes the re^etsiblp aldol cleavage or 
condensation of fruetose-1 6-bisphosphate into dihyrfro^yacetone-phosphate and yiyceraidehyde o-phosphate There 
aie t*o classes ot fiuctose-bisphosphate aldolases with diffeient catalytic mechanisms Class- 1 aldolases [>} mainly 
so found in higher eukaiyotes ait humoMMiniWic enzymes which form a Schiff-basf 1 inlet mediate ImMa wt ttw >"-2 
carbonyl gtoup of the suostrat^ fdihyom<.vacetone phosphateiand the epsibn-amino gtoup of <j lysine residue In 
\ertebrates tnree fotms of this enzyme are found aldolase A in muscle aldolase B in liver and aldolase C in biain 
The sequence around the lysine invoked in the Schitf-base is highly consened and .^n be used as a signature for 
this class of enzyme. 

25 [0791] Consensus pattern: ELIVM3-x-[LIVfv'iFYW}--E-G-x-[LS)-L-K-P-(SN) [K is involved in Schiff-base formation]- 

[1]P*-fhaniRN Biochem ^oc Trans, 18 18?-187(1PVQ| 

[2] Marsh J J LahhHj: H G Tismds Biochem Sci 17 110-1 13(1992. 

[ Jj Fteemont PS Dunbar E Fothergiil-Gilmoie L A Eiochem J 249 779~~8b< 193tn 

[0792] 203 Glvcosyl hydrolases family 11 active sites signatutes 

The microbial degradation of cellulose and />ylans requires several types of enzymes such as enooglucanases i EC 
3.2.1.4). celfobtohydrolases (EC 3.2.1.91) (exogfucanases), orxylanases {EC 3.2.1.8) [l.v;]. Fungi and bacteria pro- 
duces a spectrum of i.ellulol\tic enzymes (ceilulases) and <ylanasps which on the basis of sequpnee s inula nties e.tn 

35 be classified into families One of tf»»si» f imtiie-s is known as the c ulluiase f imilv G [3] of as the gKrosyi hvdrol ises 
family 1 1 [4. El]- enzymes which are currently known to belong to this family are listed below. - Aspergillus awamon 
xylanase C ixynC). - Bacillus circulans, pumiius, stearothsrmophilus and subtilis xylanase (xynA). - Clostridium ace- 
tnbutvlicum <ylanase ixynB) - Clostridium st-^rcoianum yvl3n<as<- ^ (xynA) - fibmb^ct^r succtnogenes vylanase C 
(xynC) which consist of two catalytic domains that both belong to family 10. - Neocallimastix patnciarum xylanase A 

40 rxynA). - Ruminococcus ftavefaciens bifunctional xvlanase XV LA (xynA). This protein consists of three domains: a N- 
terminal vylanas> : ' catalytic domain that belongs to family 11 ui gly^'osyl hydrolases a eendal domain composed of 
short repeats of Gin Asn an Trp and a C-termmal xylanase catalytic aomam that belongs to family 10 of giycosyl 
hydrolases ■ 8chizoph>llum commune vvianase A ■ Stieptomyces lividans vvianases B MnB; and C (vlnC ) ■ l"n 
chodetma r^esei ^vlanrtses I .tnd II 'Iwo of the consetved legions in these enzymes aie centered on glutamic acid- 

<fs residues which have both been shown [5]. in Bacillus pumilis xylanase. to be necessary for catalytic activity. Both 
regions were used as signature patterns. 

[0793] Consensus pattern [PoaHI.O|-\€ Y-V- [i. I vMj(:> > [DE- J- <-[F CWHNj [E- is an active site residue!- 
Consensus pattern ELIVMF3-xi2>-E-EAG3-(VvYGHQPFGSj-[3Gj-[3TAN]-G-x-iS-iF3 [E is an active site tesiduej- 

50 [ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

E 2] Gilkes N.R . Henrissat B. Kiibum D G . Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
E 3] Henrissat B.. Ciaeyssens M.. Tomme P., Lemesle L. Mornon J.-P. Gene 81:83-95(19895. 
[ 43 Henrissat B. Biochem. J. 280:309-316(1991). 

[53K0EP Akatiuka H Moriyama H bhinmyo A Hata > Katsubt; Y Utabe I Okada H Biochem J 283 
SS 117-121(1992). 

[0794] 264. Glvcosyl hydrolase family 14 
[0795] This tarnilv are beta amylases. 
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[0796] 265. Glycosy! hydrolases family 1 signatures 

It has been shown [1 to 4] that the following glycosy! hydrolases can be, on the bast? of sequence similarities, classified 
into a single family: -Beta-glucosidases (EC 3 2 1 21) from various bacteria such as Agrobactenum steam ATCC 21400, 
Bacillus polymyxa. and Caldocelium saccharolyticum - Two plants (clover) beta-glucosidases (EC 3 2 1 21). - Two 

s different beta-galactosidases (EC 3 2 1 23} from the archaebactena Sulfolobus solfatancus (genes bgaS and lacSi. - 
6--phosphc- beta-galactosidases {EC 3 _2.1_.ft5) from various bacteria such as Lactobacillus casei. L.aotococcus laotis, 
and Staphylococcus aureus. -6-phospho-beta-giucosidases (EC 3-2._1_.86 ) frorrt Escherichia coli fgenes fogIB and ascBt 
and from Erwima chrysanthemi (gene arbB). - Plants myrosinases (EC 3.2. 3. 1) (sinigrinase) (thiogiucosidase). - Mam- 
malian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108 / EC 3,2,162 ) LPH. an integral membrane glycoprotein, is 

to the enzyme that splits lactose in the small intestine LPH is a large protein of about 1900 residues which contains four 
tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosy! hydrolases One 
of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which has been shown 
[5j. in the beta-glucosidase from Agrobactenum, to be directly involved m glycosidic bond cleavage by acting as a 
nucleophiie This region was used as a signature pattern. As a second signature pattern a conserved region was 

»5 selected, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 

[0797] Consensus pattern' [LlVMI' : S1'Cj-[LIVF-'YSHLlVj-[LIVfv1ST)-E-N-G-[LlVMFARJ-[CSAGNj [E is the active site 
residue] 

Note: this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the LPH 
precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [4j. 
20 [0798] Consensus pattern F-x-[FrWM]-[GSTA]->:-[GSTAj-<-[GSTAji2)-[FYr>!l-!3-[NQ]-:<-E-^-[GSTA3- 

[ 1] Henrissat B. Biochem. J. 280:309-318(1991). 
[ 2) Henrissat B Protein Seq Data Anal. 4.61-62( 1991 ) 
[ 3] Gonzalez-Candelas L Ramon D Poiaina J. Gene 95 31-38(1990). 
25 [ 4. El Hassouni M„ Henrissat 8., Chippaux M, Barras F. J, Bacterid 174:785-777(1992). 

[5j Withers S.G., Warren R A.J . Street I. P.. Ruptte K.. Kempton J.B . Aebersoid R. J Am. Chem. Soc. 112: 
5887-5889(1990). 

[0799] 26G. Glycosyl hydrolases family 2 signatures 

30 It has been shown _1.2.E1_ that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family - Beta-galactosidases (EC 3-2.1 235 from bacteria such as Escherichia coli (genes lacZ and ebgA). 
Clostridium acetobutylicum, Clostridium thermosulfurogenes, Klebsiella pneumoniae, Lactobacillus delbrueckii, or 
Streptococcus thermophilus and from the fungi Kluyveromyces lactis. - Beta-glucuromdase (EC 3.2.1.31) from Es- 
cherichia coii (gene uid.A. and from mammals. One of the conserved regions in these enzymes is centered on a con-- 

35 served glutamic acid residue which has been shown [3], in Escherichia coli lacZ. to be the general acid/base catalyst 
in the active site of the enzyme This region was used as a signature pattern As a second signature pattern a highly 
conserved region was selected located some sixty residues upstream from the active site glutamate 
[0800] Consensus pattern. N-x-[LIVMFYWD]-R-[STACN]{2^H-/-P-x(4)-[LIVMFYWS3 ( '2)-><(3)- [DN]-x{2}-G-[UVM- 
FYW](4)- 

40 Consensus pattern: [DENQLFHKRVW]-N-[HRY3-[STAPV]-[SAC]-[LIVMFS](3)-W-[GS3- x(2,3)-M-E [E is the active site 
residue]- 

[ 1) Henrissat B, Biochem J 280:309-316(1891), 

[2] SchroederC.J., Robert C, Lenzen G. : McKay L.L., Mercenier A J. Gen. Microbiol. 137:369-380(1991). 
45 [ 2] Gfcbler J C : Aebersoid R : Withers S.G. J. Biol Chem. 267 11126-1 1 1 30(1992). 

[0801] 267 Glycosyl hydrolases family 3 active site 

It has been shown [1.2] that the following glycosy! hydrolases can be. on the basis of sequence similarities, classified 
into a single family; 

Beta glucosidases {EC 3,2 1 21 ) from the fungi Aspergillus wentii (A-3), Hansenula anomala. Kluyveromyces fra- 
gtlis. Saccharomycopsis fibuligera. (BGL1 and BGL2), Schcophyllum commune and Tnchoderma reesei (BGL1) 
Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl), Butyrivibrio fibnsolvens (bglAi. Clostrid- 
ium thermocellum (bgiB), Escherichia coli (bg!X), Erwtnia chrysanthemi (bgxA) and Ruminococcus albus. - Al- 
ss teromonas strain 0-7 beta-hexosamtnidase A (EC 3.2. 1.52), 

Bacillus subtilis hypothetical protein yzbA 

Escherichica coli hypothetical protein ycfO and H10959, the corresponding Haemophilus influenzae protein. 
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One of the conserved regions in tnese enzymes is centered on a conserved aspartic actd residue whicn has been 
shown [3]. in Aspergillus vventu beta-giucosidase A3, to be implicated in the catalytic mechanism. This region was used 
as a signature pattern. 

[0802] Consensus partem [L[VM3(2HKR3-.-[EOK]-yi4vG-[LIVMFTHLlVTHL!v'MFH&T]-D->^2)-[SG^DN!HD is tne 
s active site residue] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[2] Castle LA.. Smith K.D., Morris R.O. J. Bacterid. 174:1478-1485(1892), 
[ O'j bsuse E: t.eglei G Biochim Piophys Acta (r !6 45P 46E( 198U) 

10 

[0803] 20f* Glvai^yl hydrolases f;<mtiy 8 siqn ilum 

The microbial denudation ot cellules and ^tens rvquiieb be\eiai types of encymes stkh at. tendd^itkanas^s. ^EC 
<<1 14i cellotiohvdiolases (E-:C 2 "1 9t n^ogltk anas.es. s 1 1 *>lan.tses (C:C 3 1 18i[U3 f : uny i ano tactfriia pro- 
duces a spectrum of :e!!ublytic enzymes (ceiluiases) and <ylanases whim on tne basts of sequent stmtlanties ;<jn 

»5 be classified into families. One of these families is known as the celluiase family D [3] or as the glycosyi hydrolases 
family 8 [4, E 13. The enzymes which are currently known to belong to this family are listed below - Acetobacter xylinum 
endonucleas e c mc-\> ■ ^.joiIIus stiain K^M ,<30acidk endt ntklpjse K itindo-K) Celluk nrwn.jp josuienaoylucrtn jse 
2 tcelBj - Celluiomonas uda endoglucanase - Clostridium ceilublv'ticum endoyiucanases C (celcCCj - Closttidium 
thermoeellum endogiucanases A (celA). - Erwtnta chrysanthemi minor endoglucanase y (ceir) •• Bacillus arculans 

so b> : 'ta-glucan : ibe (EC 3 2 1 v 3) - Eschtiiichia -xli hvpcth-itio : il piotein yh|M The mosl conserved region in th^st; en- 
zymes is a strebh of about 20 lesidne^ that ^ntains two j^n^er^ed usfartate The fust asputatate is thottgnt [5] to 
act as the nucieophile in the catalytic mechanism This leyton was used as a signature pattern 
Consensus pattern: A-[ST]-D-[AG3-D-x(2>-[IM3-A-x-[SAHLiVMHUVMG]-x-A- x(3HFW] [The first D is an active site 
residue]- 

[ 1j Begum P Annu Rev Mioiobioi 44 2K-M%(V?iQ) 

\ 2] GilKsr-s N R Hennss,atB KtlbumDG Miller R C Jr WairenRA I Microhid Rev 65 !5(19V1 i 
[ 3J Henrissat B.. Claevssens M.. Tornrne P.. Lemesie I.. Mornon J.-P. Gene 81:83-95(1989). 
[ 43 Henrissat E Biochem .I 28u 309-3 !Gt1C91) 
30 [b]Afc*nPM ->otKhonH Domin ju>^ R Stiucture 4 26^-2o(1996 1 

[0804] 209 Glycosyi hydrolases famti\ ^ actis'e sites signatures 

The inkrobial denudation ot celluiot,^ and ^lirib rt-quneb be\eiai types of encymes stkh at. tendogitkanas^s. ^EC 
,<1 1 4) cdiohk hvdroias.es. tECM 1 HPXoylucan.jses) or>vian^sePiE:03 2 1 2] Funoj ana brtctena piootKP 

35 i spectrum lit evil ulo!\ tie enzymes (fulluia^) ;<nd AVlana^ which on the basis of sequence; simi!atiti> j s can t»» 
classified into tamiiies. One of these families is known as the celluiase faintly E [3] or as the glycosyi hydrolases family 
9 [4 E1] The enzyme which ait cunently knov\n to belong to this tainilv ait li&ttd below - Butyrivtbrio fibnsok'cns 
reiiode>trtnase 1 ic^dH - C<-linlomnnas fimi endogiucanases B icenB) ano C i^enO - Closttidium cellnlolytrum 
endoglucanase G(celCCG) - Clostridium ceiiulox/Oians enooglucanase C i.engC^ -Clostridium stercoaranumendog- 

40 lucanase Z (avicelase DtcelZ). -Uostridiumthermocellum endogiucanases D (celO). F tcelFiand I (cell). -Ftbrobacter 
sucomog^nt-s enooglucanasc A (endA) - Pst-udomon : ib fiuoiese^ns endogluoanase A uelAt - Stieikmy^es ittkult 
endoglucanase 1 (ceil ). - Thermomonospora fusca endoglucanase E-4 (celD). - Dictyostelium discoideum spore ger- 
mination specific endoglucanase 270-6. This slime mold enzyme may digest the spore celi wall during germination, to 
lelfrasefhe enoksed amoebj ■ E:nooyluc^n.ts<rs fioin plants stkh as Avocado or French bean In plants this enzyme 

45 nuv bf involved iht= fnut npemng procus^ Tv\o of the must conserved mqhins in lht?st? enzymes 3re (.t=ntt;is;d on 
conserved residues which have been shown [5.6], in the endoglucanase D from Celluiomonas thermocellum. to be 
important for the catalytic activity. The first region contains an active site histtdine and the second region contains two 
catahtnaily imporfjnt le^nues an aspartate am a glutamote Botn regions! weie us^ed as signattue putteinsi 
[0S05] Consensus pattern (ST\ j-v-fLIVMri HSTV]-v(2j-G-v-[NKR]~x I 4i-[PLIVM]-H-v-R [I i is an active site residuej- 

50 Consensus pattern: [FYW]-x-D-x{4)-[FYW3-x{3)-E-x-[STA]-x(3)-M-[STA3 [D and E are active site residues]- 

[ 13 Begum P. Annu. Rev. Microbiol. 44:219-248(19901 

[23 Gilkes U.R., Henrissat B.. Ktiburn D.G., Miller R.C. Jr.. Warren RAJ. Microbiol. Rev. 55:303-315(1991 ), 
[ 7] Henitss=it B c hi-iyssens M Tonnne P Ltme^le L Mornun J -P t^nt 31 83-95( 1989) 
ss [43 Henrissat B. Biochem. J. 280:309-316(1991). 

[ S3 Tomme P ChamauxS Begum P Millet J Aubert J ~P Claeyssens M J Eiol Chem 2c6 103 1^-10318 
{19911 

[ 63 Ton in it- P van B^eumen J CLi^yss-ins M Biochem J 285 3 ! 9-'>24{ 1 4021 
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[0806] 17o Glycetaldehyde 3-phospnate dehydrogenase active bite ^gpdhi 

Glyv. ei aldehyde 3-phoaphate dehydi oyenase < EG 12 1 y 1 i GAPDH t [ 1 1 ib a tetr =tmerk NAD-btndtnp, enzyi ne common 
to both the qiyeolvtic and gluconeogenic pathways. A cysteine tn the middle of the molecule is involved in forming a 
content pfnssphf'gKJwo! thme^tw mtet mediate The sequence around thiss ^vsteine is totally' tons^n ed in abacterial 
s an<1 eukaryotic GAPDHs and is also orcscnt albeit in a variant form in tnc otherwise highly divergent archaebactcnal 
GAPDH [2| Evsihenchia toll D-etvthiose 4-phosphatpdehydioyen.tse<P.4PDHHgt?riee|:d oicupBns an en.Tvme highly 
related to GAPDH [3]. 

[0807] Consensus pattern [ASV^-C-fNTj-T-^'-JUM] [C ts the active site residue]- 

10 [1] Harris J.L Waters M. (In) The Enzymes {3rd edition) 13:1-50{1978V 

[2] Fatty S Lang J Niennann T Vingron M l-kmse-l R Eur J Biochem r9 405-4n, 198«) 
[3] Zhao G., Pease A. J,. Bharani M.. Winkler M.E. J. Bacteriol. 177:2804-2812(1995). 

[0808] 271. Granulms signature 
»5 Granulms [1] are a family ofcysteme-nch peptides of about 6 Kd which may have multiple biological activity. A precursor 
protein {known as acrogranm) potentially encodes seven different forms ot granuitn (grnAto grmG) which are probably 
telexed by post-ttansl .ttk nal pioteolytic processing A scheirwtk repieientation ofthestttkture of a gianulin is sho*\n 
below: 

^vv„nav\^Cx<\avCOvx<nav\^cGv\^yx<cGv\^yv„c^vx<nGv\^yx<Ca conserved cysteine probably 

so involved in : i disulfide bond'-' position cf fhe patltin tjianulins are e Glutton : ny telafed to a PMP-C1 a peptide 
e>trarted fn>m thepais inteicerebralis of migratory locusts [2] 

[0809] ConsenbUb pattern C-^-D-,m2VH-C-C-P~y(4i~C [The foui C's are ptooabiy m\ oKed in disulfide bonds]- 

[IjBhmdaiiV Paiftee R G Batoman Ptuc Natl Arad 3oi U S A 8« n ^-1710{1032t 
25 [23 Nakakura N„ Hietter H.« van Dorsselaer A., Luu B. Eur: J. Biochem. 204:147-153(1992). 



[0810] 272 (HCV RdPp) Hepatitis C ^ uus RNA dependent RHa [. olymera^e 

[0811] The RNA dependent RNA polymerase- is also S-novsn as rion^tructut i! protein NS5B N35B ts a 65 kDa protein 

that resembles other viral RNA polymerases. HCV replication is thought to occur tn membrane bound replication com- 
30 f lexes The-,e cum\ !e*es ttansenbe the pcsitwe sttand and the resulting minus strand ^ used : is a template for the 

synthesis of genntni; RNA There art* two yual proteins inyH^ea in the teaction MS3 and NS5S (1 2] 

[081 23 [1] Lohmann V Korner F Henan U Bartenschlager P 

J Virol 1997:71:8416-8428. [2] Behrerts SE. Tomei L. De Francesco R: 

EMBO J 1996;15:12-22. [3] ishido S, Fujita T, Hotta H; 
35 Biochem Biophys Res Cornmun 1 998.244 35-40. 

[0813] 273. (HHH) Helix-hairpin-heiix motif. 

[0814] [1] Doherty AJ, Serpen LC. Ponting CP. Nucleic Acids Res 199624:2488-2497. 
[0815] 274. HIT family signature 

Recently a family of small proteins of about 12 to 16 Kd has been descnbed[1j This family currently consists of: - 
40 Mammalian protein HINT {also known as Protein kinase C inhibitor 1 or PKCI- 1 ). HINT was incorrectly thought to be 
a specific inhibitor of PKC It has been shown to bind zinc - Fission yeast diadenosine 5',5"'-P1.P4-tetraphosphate 
asymmetrical hydrolase (Ap4Aase) (EC 3 6,1. 17} [2] (gene aphh. which cleaves A-5'-PPPP- 5'A to yield AMP and ATP. 
■■ FHIT. a human protein whose gene is altered in different tumors and which acts [3] as a diadenosine 5', 5"'--P1.P 3 -tri- 
phosphate hydrolase iAp3Aase} (EC 3.6 1 29) cleaving A-5'-PPP-5'A to yieid AMP and ADR ■ Yeast proteins HN'1'1 
45 and HNT2 - Maize zinc-binding protein ZBP14 - Escherichia coh hypothetical protein ycfF - Haemophiius influenzae 
hypothetical protein HI0961. - Helicobacter pylori hypothetical protein HP0404. - Methanococcus jannaschit hypothet- 
ical protein MJ0866. ■■ Mycobacterium leprae hypothetical protein U296A. ■ Synechocystis strain PCC 6803 hypothetical 
protein slr1 234 -Caenorhabditis elegans hypothetical protein F21C3. 3. -A hypothetical 13 2 Kd protein in hisE 3'region 
in Azospirillum brasilense. - A hypothetical 13.1 Kd protein in p3? S'region in Mycoplasma hyorhims - A hypothetical 
so 12 4 Kd protein in psbAII 5'region in Synechococcus strain PCC 7942 Ail these proteins contains a region with three 
clustered histidmes. This region is responsible forth© designation of this family: HIT, for 'HIstidineTriad [1 j. This region 
was originally thought to be implied in the binding of a zinc ion but was later identified [4] as part of the alpha-phosphate 
binding site of a nucieotide-binding domain. As a signature pattern, the region of the histidine triad was selected. 
[081 6] Consensus pattern: [NQAj-x(4>-[GAV]-x-[QF3-x-ELIVM3-x-H-[LIVMFYT3-H-[LiVMFT]-H-[LIVlViF]{2HPSGA]- 

ss 

[ 1] Seraphin B. DNA Seq. 3:177-179(1992). 

[2] Huang Y . Garrison PN , Barnes 1. D Biochem. J. 312:925-932(1995!. 

[ 3] Barnes L D . Garrison P.N , Siprashvili Z , Guranowskt A , Robinson A K., ingrain S.W . Croce C M., Ohta M.. 
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Huebner K Biochemistry 3C 1 1G^b'-1 1 53"3t iOttoi 

[ 4] Biennei C 'Vinson P Gilmour J PHaach O Pmne D Pttsko G A Lowenstetn J.M. Nat. Struct. Biol. 4: 
231-238(1997). 

* [0817j 273 Myc-t\pc 'nelivloop helix' dimcrization domain signature (HLHi 

Anumbei of e'lk^rxotic t totems, vUiich prtbabiy ate sequence specific DNA-bmding pioteinsthat act .tsttanscnption 
f<3t tor*, shaft; .3 conserved domain of 40 to 50 amino acid ttisidims It h is b<*en proposed [1] that the domain it fonntd 
of two amphipathic helices joined by a variable length linker region that could form a loop. This 'belrx-foop-heiix (HLHt 
domain mediates piotein dimeurratton arid has been fo'tnet in the proteins listed below [2 3 r: 1 £.':] Most of these pto 

10 [tint, Nave an ofr : i basic region of about 15 amino acid r^iduts that is adjacent k the HLH domain arid specifically 
hinds to DMA Th-n ar« teferud is basic hdiy-loop-hf-liv prolans (hHLHI arid art classified in two groups Uass 
{ubiquitous} and class B (tissue-specific). Members of the bHLH family bind variations on the core sequence 'CANNTG' 
also re fen ed to as the £-tXK motif The hot no- oi heterodu net nation mediated ty the HLH domain is independent t f, 
but necessity for Df\!A binoing as fw? baste regions ute requited foi DN£ timing ..uhvih The HLH f totems lacking 

*s the basic domain (Emc. Id) function as negative regulators since they form heterodimers, but fail to bind DNA. The 
hairy-related proteins (hairy, E(spi), deadpans also repress transcription although they can bind DNA. The proteins of 
this subfamily act together with co-repressor proteins, like groucho. through their C-termirta! motif WRPW. - The myc 
family of cellulai oncogenes [A] which is curtentiy i.nown to contain foui members c-m\c [E3j N-myc L-myc and 8- 
mvc. The mvc genes are thought to play a role in cellular differentiation and proliferation, - Proteins involved in myo- 

20 genesis {the induction of muscle cells). In mammals MyoDI (Myf-3). myogenin (Myf-4). Myf-5. and Myf-6 (Mrf4 or 
herculin\ in birds CMD1 t'QMF-1 ), in Xenopus MyoD and MF^S, in Caenorhabditis elegans CeMvoD. and m Drosophila 
nautilus inau) - Verteorate pioteins tnat bind specific DNA sequences t'E ooses'i in yatious immunoglobulin chains 
enhancers: E2A or ITF-1 (E12/pan-2 and E47/pan-1 ). ITF-2 (tcf4). TFE3. and TFEB. - Vertebrate neurogenic differen- 
tiation tactot 1 that acts is different! ition factor dm trig neurogenesis - Vertebrate* MAX protein a transcription regulator 

25 that forms a sequence- specific DNA-bindmg protein complex with myc or mad. - Vertebrate Max interacting Protein 
1 (MX11 proteim which acts as a transcriptional repressor and may antagonize myc transcriptional activity bv competing 
for max. - Proteins of the bHLH/PAS superfamtty which are transcriptional activators. In mammals, AH receptor nuclear 
translocator(ARNT). single-minded hoinclogs{$!M1 and SIM21. hypoxia-inducible factor 1 alpha {HIF1 At, AH receptor 
(AHRi neuronal pas domain ptotetns (NPAS1 and NPAS'Ii endothelial pas domain ore-tern i (ERAS n mouse ARNT'~ 

JO and human BMAL1 In drosophila single-minded (bIMt AH recepiot nu.-le : n ftjnsloo : it.->t <ARC! I ) traehe : iiess piotein 
(TRH). and similar protein (SIMA). - Mammalian transcription factors HES, which repress transcription by acting on 
two types of DNA sequences, the E box and the N box. - Mammalian MAD protein {max dimenzer) which acts as 
transcriptional repressor and may antagonize myc transcriptional activity by competing for max. - Mammalian Upstream 
Stimulatoiy f-'actoi 1 and 2 tUJ.F1 and USf\?) which bind k .j symmetrical DNA sequence that is found in a variety of 

55 viral and cellular promoters. - Human !y!-1 protein: which is involved, by chromosomal translocation, m T- cell leukemia. 
- Human transcription factor AP-4. - Mouse helrx-loop-helix proteins MATH-1 and MATH-2 which activate E box-de- 
\ endent ttansaipticn in .'olhbotation with E47 - Mammalian slum coll pickin (R<~ L) utKo known as taN) ft ptoteiri 
which may play an important role in hemopoietic differentiation. SCL is involved, by chromosomal translocation, in 
stem-cell leukemia, - Mammalian proteins Idl to !d4 [5], Id (inhibitor of DNA binding) proteins lack a basic DNA-btnding 

40 domain but are able to form heterodimers with other HLH proteins, thereby inhibiting binding to DNA. - Drosophila 
e>tt?t-mj.'roohat;t?j< : ' h-miic) piotein which \ adiapfttes in sensory otgan patterning by aniaqoni2inq the neuioqoni.' 
activity of the achaete- scute complex. Emc is the homolog of mammalian Id proteins. - Human Sterol Regulatory 
Element Binding Pmtein 1 sSRE-PP 1) a tianscnptional activatoi that hinds to the sterol tegulatory element 1 tSRE- 
1) kund in the flanking region of the LOLR gene and m othet genes ■ Drosophila a>. h.jete -s>. ute >AS-0 ..omplex 

■*s proteins T^ (Lsc t T4 (scute) T" (achaetf-t and T8 tas-Tisel The AS-C proteins ate involved in thu dtk-tminalion ol 
the neuronal precursors in the peripheral nervous system and the central nervous system. - Mammalian homologs of 
achaete-scute proteins, the MASht-1 and MASH-2 proteins. - Drosophila atonal protein (atoi which is involved in neu- 
rogenesis. - Drosophila daughterless (da) protein, which is essential for neurogenesis and sex-determination - Dro- 
sophila deadpan tdpn> a hairy-ltke protein involved in the functional differentiation of neurons - Drosophila deltlah 

so (dei) protein, which is plays an important role m the differentiation of epidermal cells into muscle. - Drosophila hairy 
lh) protein a ttanstnptional repressoi which legtilates the etnbr\onic i egtnentation jnd .tdult bustle p.jttetnmg Dio- 
sophila enhancer of saplit ftofeins EispLi thut ate huiiy-like proteins actr e dunng neutogen^sis <jlso u;t fronsenp- 
ttonai repressors. - Drosophila twist ftwi) protein, which is involved in the establishment of germ layers in embryos. - 
Mace anthocvanin regulatoiv proteins P b and l.C -Veast centromere-binding pmtein 1 (GPF1 oi C Bf : !) 'Ihii, protein 

55 ts involved m ohtumo^om il sf-gr^gatiun it binds lo a hiohly cons^r^-d DNA s^qu^ncs; found in ctnttom^ts and in 
sex eral oromoteis - Veast INO'I and iN04 proteins - Veast phosphate s\ stem positive regulatory piotein PH04 s^.htch 
inter ^-fe with the up^tieam activating sequenv of t^v^ral acid ph<. % hata^e gene-> - Yeast senne-uii piotein TYF7 
that is r^guited fot ty-m^difittid ^DH2 e>ptession - N^urospota ctassa nuo-1 a protein tti^t fsctivates ttu : ' transctiplion 



125 



EP 1 033 405 A2 



of structural genes for phosphorus acquisition - Fission yeast protein esci which is involved in the sexual differentiation 
process. The schematic representation of the helix-loop-heli* domain is shown here 

>x:<xx:<xx>:x<>x:<xx>:xox:<xx kxxoxxxkxxoxxxkxxoxx Amphipathic heii> 1 Loop Amphipathic heii>: 2 

The signature pattern developed to detect this domain spans completely the second amphipathic helix 
s [0818] Consensus pattern: (DENSTAP|-[KTRj-(L!VMAGSNT]-{FYWCPHKR}-{LIVMTj-{LIVM3- x(2)-[STAVHLIVM- 
STAC KRJ-x^VM FYH HLIV MTAHPH^H^I VMRKHQj. - 

[iJMurreC McCavS PS Baltimore D Cell 5C 83<1 'J8t*> 
[ 7\ Gairei J Campu~ano c3 Bio£.ssa>s 10- 4<V- 4i St 1 
10 [ 3] Kato G J Dong r y FA^EB J 6' 3085-3072(1 992) 

[ 4] Krause M.. Fire A.. Harrison S.W.. Priess J.. Weintraub H. Ceil S3:90?-919(199p) : 
[ 5j RieKhrmnn V van Crue<_hten I Sabliteky F Nuclei Acids R<=s 22 "49-755( 1^94) 

[0819] 276 HMG14 am HMG1- signature 
*s High mobility group (HMG) proteins are a family of relatively low molecular weight nonhistone components in chromatin. 
HMG14 and HMG17 [1], two related proteins of about 100 amino acid residues, bind to the inner side of the nucleosomal 
DNA thus altering the interaction between the DNA and the histone octam^r These two proteins: irwy he involved in 
the process which maintains transcnbabie genes in a unique chromatin conformation. The trout nonhistone chromo- 
somal protein i-k fhistone Tj also belongs to thii. family ^s a signatuie pattern a conserved stretch of 10 residues 
lorried in ihe M-tumim*! section ot HMG14 and HMG1 7 was sele ofed 
[0820] Consensus pattern P-P-S-i-R-l-3-A-iRr">P- 

[0821] [1]BustinM ReexesR Piog Nucleic Acid Res Moi Biol 54 0 0-100(1^6' 
[0S22] 277 Hidi0xvn-vthylJ.iut.3i / 1 -coenzyme 6 lyase active site tHMGL ! i 

l-hydtuvy-3-meihylglut;»v!-coen^yme A Uase (HMG-<~ o-i iyase or HLt t EC 4 1 3 4)oataiyzes the ti 1 reformation of 
HMG-CoA into acetyl-CoA and acetoacetate In vertebrates it is a mitochondrial enyme which is involved in ketogenesis 
and in leucine cataboiism jl j. In some bacteria, such as Pseudomonas mevalonii, it is involved in mevalonate catab- 
olism tgensr- invaB) A cysteine has been *,hown32] in mvaB to te tequired for the activity cf the en;:ymsr- The region 
iround this, residue is perfect!', conserved md it used as ;< signature pattern 
[0823] Consensus pattern S-V-A-G-L-G-G-C-F-Y [C is the active site lesiauej- 

[ 13 fA tit -h^!! G A Robert M -F Hiuz PW Wang 3 Fontaine G Behni-e C E Mende-Muellet L M Schappwi 
K.. Lee C. Gibson K.M.. Miziorko H.M J. Biol. Chem. 288:4376-4381(1993). 
[23 Hruz P.W.. Narasimhan C. Miziorko H.M. Biochemistry 31:6842-6847(1992). 

[0824] Alpha-isoptopylmaiate ana hoinooirrate svnth;<ses sign riurus (HMGL2; 

The following ensvmes have been shown [1 3 to be tunctionaliy as welt as evolutionary related: - Alpha-isopropylmalate 
synthase 1 EC 4 1 1 12) which catalyzes the fust hitp in the biosynlhesis- of leucine tht eondens-aiion cf ai.etyl->"oA 
and alpha- l-f-tnisn^Ml^Mte to torm 2-c=;opiopyliTial3t- i synthase -HomocitratesyntnaseiCC4 1 21 ho^nt- ntfV) which 
is mvolveo in tne Di?s\nthesis of the iron-molybdenum cofactoi of nitrc-genase and catalyzes the condensation of 

■to acetyl-CoA =ind alpha-kekgiutaiate into homoutiste - So\bean late nodulin 5C - Methanococxis iannab_iiu hypo- 
thetical proteins MJffjO •> MJ119^ and MJ1 VJz Tv\o (.onstivid regions, were sek-K t> : 'd as signafute patterns kti these 
enzymes. The first region is located in the N-terminal section while the second region is located in the central section 
and contains two conserved histidme residues which could be implicated in the catalytic mechanism, 
[0625] C onsensus pattern 1. R [DGj-G < O->(10)-K- 

4S Consensus pattern: [LiVMFWj-x(2t-H-x-H-[DN3-D-x-G-x-|GAS]-x~[GASL!]- 

[0826] J i] Wang K -2 Dean D R Chen J -S Johnson J L J Bacterid 1~C- o04 I->r046t 19- 1) 
[0827] 2/8 1 HMG COA svnt) Hydiowmethvlojutary! f-oen^vme a svnthase active site H>drov\methylglutaivl-coen- 
zv'me A synthase i EC 4 1 ? 5; <HMG-CoA synthase) catalyses the condensation ot ,j^tvl-CoA with acetoacetyl-CoA 
to produce HMG-CoA and C 0 & . [ 1 3 In vertebrates there are two isozymes located in different suDcellularcomoartments 

■io a atomic toini which is the starting point of th<= nwaionaW pathv, ^ which ie^db k chole^tt-K I and other frolic ^nd 
isoprenoid compounds ana a mitochondria! form responsible for ketone body biosynthesis. HMG-CoA is also found in 
othet eukaryofes such as insect, plants and fungi A cysteine is known to act as the catalytic nucleophile in the fust 
step of the reaction, the acetyiation of the enzyme by acetyl-CoA The conserved region was used around this active 
site; residue as a signature; pattern. 

ss [0828] Consensus pattern N-x-[DN3-[IV]-E-G-[IV]-D-K,2VN-A-C-[FY3-x-G [C is the active site residue]- 

[0829] 3 1] Rokosz 1 1 Boulton D A . Butkiewicz E A . Sanyal G . Cueto M A Lachance P A . Hermes J D. Arch. 

Biochem. Biophys. 312:1-13(1994). 

[0830] 279. HMG (high mobility group) box 
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[0831] '18U HSF-type DNA-binding domain signature 

[0S32] Heat shock ta<. tor (HSFi is a DNA-binding pjotetn that specifically binds he=itstK .rkpiomotei el^m^nt*. (H^E) 
HSE is a palindromic element ru" h with rt;( etttivt; purine and pyrimidrne motifs S'-nGA^nnTTCnnGAAnnTT*" r\-o' H^F 
is eyptessed at norm.il tempeiatures but is activated oy heat shoe V m chemi:a! stiessors [1 2] The sequen:es of HSF 

* from various species shew, extensive similarity in a region of about aO amino acids which has been shown [3] to bind 
DNA Some othet proteins aho <.ont.tin .t HSF domain the 1 ;? are ■ Ye.tst SR.1 a prttein involved in o<rlf surface 
iss> j mbiy and recusation of the gene related to floccul ition tas> j *:u i! evil age legation) [4] - Yeast iranscnption factor 
SKN7 (or BRY1 or POS9), which binds to the promoter elements SCB and MOB essential for the control of G1 cyclms 
e<pre?ston [6\ ■ Yeast MGA1 ■ >east hypothetical protein WR14'*\< A pattern fiomthe most conserved part of the 

io HSF DNA-t mding domain w : is de med it 1 , o« nh al region 

[0833] Concensus pattern L-^i^fFVj-K-H^^-vfSTWl-S-F-fLIVMJ-R-O-L-fNhlj-y-V-vfFV^-ERKFli^-fLIVM]- 

[ 1 ] Soi get P K Cell 66 36,3 ■ >36 t 1 9P 1 > 

[2] Magei A'H Morodus Fetreiia P Biochem J 290 M3( 1993' 
*5 [ 3j Vuister G.W., Kim S.-J., Orosz A., Marquardt J., Wu C, Bax A. Nat. Struct Biol. 1 :80S-613(1994). 

[4] Fujita A., Kikucht Y., Kuhara S.. Misumi Y,. Matsumote S.. Kobayashi H. Gene 85:321-328(1989). 
[ 5] Morgan B.A.. Bouquin N„ Merrill G.F. Johnston L.H EMBOJ. 14:5679-5689(1995). 

[0834] 231 Heat shock hsp20 pioteins famil> piotile 

J- PtcMryotro and eukaiyotic organisms tespond to tie at shook or othet envitonnvntal stress by inducing the synlhesis 
of proteins rolle lively kncMn as heat-shoik f icteins (hspi (1j ^mongst them is a famiK ff pioteins with an uveiage 
moieculai weight of 20 Kd known as the hso20 proteins [2 to 5] These seem to act as chapetones that can protect 
other proteins against heat-induced denatu ration and aggregation. Hsp20 proteins seem to form large heterooligomenc 
aygtegates th> j ir family is currently composed uf the ioliovsrng members -WrtuiM tie huat shook pioieiri hspC"" i hsp25) 

25 induced by a variety of environmental stresses. -Drosophiiabeatshock proteins hsp22. hsp23, hsp26, hsp27, hsp67BA 
and BC ■■ Caenorhabdttis elegans hsplS multigene family. ■ Fungal HSP26 (budding yeast; and hsp30 (Neurospora 
crassa and Aspergillus Niduians) - Plant small hsp's Plants have four classes of hsp20 classes I and II which are 
cytoplasmic, class III which is chloroplastic and class IV which is found in the endomembrane. - Alpha-crystallm A and 
B chains. Alpha-crystallin is an abundant constituent of the eye iens of most vertebrate species Its main function 

30 appears to be to maintain the correct refractive index oi the- lens. It is. also found in other tissues where it seems to act 
as a chaperone [6] - Schistosoma mansoni major egg antigen p40. Structurally, p40 is built of two tandem hsp20 
domains. - A variety of prokaryotic proteins: ibpAand ibpB from Escherichia coli, hsp18 from Clostridium acetobutyli- 
cum, spore protein SP21 (hspA) from Stigmatelia aurantiaca, Mycobacterium leprae 18 Kd antigen and Mycobacterium 
tuberculosis 14 Kd antigen. - Methanococcus jannaschii hypothetical protein MJ0285. Structurally, this family is char- 

35 acterized by the presence of a conserved C -terminal domain of about 100 residues. The profile developed to detect 
members of the hsp20 family is based on an alignment of this domain. 
[0S35] -Sequences known to belong to this class detected by the profile ALL 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet, 22:831-677(1988). [ 2] de Jong W.W., Leunissen J.AM., Voorter C.E, 
M. Mol. Biol. Evol. 10:1 03-1 26(1993}.[ 3] Caspers G.J., Leunissen J.A.M., de Jong WW. J. Moi. Evol. 40:233-248 
40 (1995).[4j Jaenicke R.. CreightonTE. Curr Biol 3:234-235(1993) [ 5j Jakob U.. Suchner J. Trends Biochem. Sci 19 
205-211/1994} ( 6j Geoenen P.J T.A , Merck K B . de Jong WW. Bloemendal H. Eui J Biochem 225' 1-9(1994} 
[0636] 282. Heat shock hsp70 proteins family signatures 

[0837] Prokaryotic and eukaryottc organisms respond to heat shock or other environmental stress by the induction 
of the synthesis of proteins collectively known as heat-shock proteins (hsp) jlj Amongst them is a family of proteins 

■*s with an average molecular weight of 70 Kd. known as the hsp^Oproteins [2 ; 3 : 4j. In most species, there are many 
proteins that belong to the hsp70 family. Some of them are expressed under unstressed conditions. Hsp70protems 
can be found in different cellular compartments (nuclear cytosoiic, mitochondrial, endoplasmic reticulum, etc.). Some 
of the hsp70 family proteinsare listed below: - in Escherichia coli and other bacteria, the main hsp70 protein is known 
as the dnaK protein A second protein, hscA, has been recently discovered. dnaK is also found in the chloroplast 

50 genome of red algae - In yeast, at least ten hsp70 proteins ate known to exist SSA1 to SSA4, SSB1. SSB2. SSC1. 
SS01 (KAR2), SSE1 (MS13) and SSE2. ■ in Drosophiia, there are at least eight different hsp/0 proteins" HSP70, 
HSP68, and HSC-1 to HSC-6 - in mammals, there are at least: eight different proteins: HSPA1 to HSPA6. HSC70, and 
GRP78(aiso known as the immunoglobulin heavy chain binding protein (BiP)). - In the sugar beet yellow virus (SBYV), 
a hsp/'O homolog has been shown j5] to exist - in archaebacteha, hsp70 proteins are also present [6]Ali proteins 

55 belonging to the hsp70 family bind ATP A variety of functions has been postulated for hsp70 proteins. It now appears 
[7] that some hsp7Qproteins play an important role in the transport of proteins across membranes. They also seem to 
be involved in protein folding and in the assembly/disassembly of protein complexes [8]. Three signature patterns for 
the hsp70 family of proteins were derived; the first centered on a conserved penta peptide found in the N-termina! 
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section of these proteins: the two others on conserved regions located in the central part of the sequence 
[0638] Consensus pattern: [IVj-D-L-G-T-fSTj-x-fSC] - 

Consensus pattern: [LiVMF3-[LIVMFY3-[DW]-[LiVMFS3-G-[GSH]-[GS]-[AST]-x{3)-[ST3-[LIViVl]-[LiVMFC3- 
Consensus pattern: [LfVMY3-x^LIVMF]-x-G-G-x-[ST]-x-[LIVM}-P-x-{LIVM3-x-[DEQKRSTA]- 

s 

[ 1 ] Lindquist S„ Craig EA Annu. Rev Genet. 22:631-877(1988) 
[ 2] Pelharn H.R.B. Cell 46:959-961(1986). 

[ 3] Pelham H.R.B. NatIIre'332T776^77ll988H 4j Craig E.A. BioEssays 11:48-52(1989). 
[ 5) Agranovsky A. A.. Boyko V.P., Karasev A V . Koonin E.V., Doija V'V. J. Mol. Biol 217:603-610(1991). 
10 [ 63 Gupta R S , Singh B J Bacterial 174:4594-4605(1992) 

[ 73 Deshaies R J., Koch B.D., Schekmam R Trends Biochern Set 13 384-386(1988). 
[ 3] Craig E A.. Gross C.A Trends Biochern Set 16.135-140(1991} 

[0839] 283 Heat shock hsp90 proteins family signature 

»5 Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction of the 
synthesis of proteins collectively known as ht--at-shock proteins fhsp) [13 Amongst tht--m is 1 family of proteins, with 
dn a^er.jge molecular Aeioht ot 'X) Kd known .js the hsp90proteim Proteins !*nown to bekng to tfm family are" ■ 
Eschetichia oil and other oactena heat snock ptotem cc2 5 1 gene htpGj - Vertebrate hso UO-aipha ^hsp 86) and hsp 
90-beta (hsp 84t ■ Pio^ophiia hsp 82 (hsp 8?) ■ Tiypanosoma ^inri hsp 85 ■ Plants Hspb2 or Hspb3 Yeast and 

so oihf»f funqi H"-"u32 and HbP32 -The endoplasmic lelicuium protein Widof jasmin' false ktioAn as Eip^O in mouse, 
GRP94 in hamster, and hsp 108 in chicken VThe exact function of hsp90 proteins is not yet known. In higher eukaryotes, 
hspDu has been found associated with steioid hormone receptois with tyiosine kinase oncogene products of several 
retroviruses, with elF2alpha kinase, and with actm and tubulin. HspSO are probable chaperonms that posses^ ATPase 
activity (2 1] As a signature p litem fot thu hsp90 family of proteins a highly consumed rtsoion found in the N terminal 

25 part of these proteins was selected. 

[0840] Consensus pattern Y-x-jNOHj-K [DE J- [fVAJ-K-- 1. P \ED] ■ 

1 1] Lindquist S . Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 
[23NadeauK Das A Walsh C T J bio! Chem 2c8 14~9~i487( I3u3i 
30 l o] Jakob 1 1 Buohnei J I r^nds Biochern So f) 20V21 It 1994) 

[08413 284. Helix-turn-helix (HTH3) 

[0842] This large family ot DMA binding heh\-tum heli* piotem^ includes C10 Swiss P03U3G and > I mviss P03034. 
[0843] 286 Heme oxygenase signature 

35 H> j mt; oxygenase (EC 1 14 9 f » \n HO) (1 3 is the microsomal en^ymt; that in animals carries out thu oxidation of heme, 
it cleaves the heme ring at the alpha methene bridge to form biliverdin and carbon monoxide. Biliverdin is subsequently 
convened k biiiiutin by biliverdin iedui.tase If! mammals iheie ait Huet; isozymes of heme ovg^nase HO-1 to HO- 
3. The first two isozymes differ in their tissue expression and their inducibility: HO-1 is highly inducible by its substrate 
heme and b\ various non-heme substances wniie HO-2 is non-indncible It has been suggested [2] that HO~2 could 

40 be implicated in th*> production of carbon mono<ide in th<= hiain where it is said to act as a nfeurotiansmittet In the 
genome of the chloroplast of ltd algae as well Ah in cyanobactena ihere is a heme o^yg^nase 1 qene pbsAi that is the 
key enzyme in the synthesis of the chromophonc part of the photosynthetic antennae [33. An heme oxygenase is also 
present in the bacteria Corynebactenum diphthenae (gene hmuGi. where it is involved m the acquisition of iron from 
the host heme (43 There is in the central section of these enzymes .j well con* erved tegion centered on a histidine 

45 lesidu- 1 v\hich is proposed to play a l^ role in binding the substrate hem« at the active (.untet 01 the un^vme This 
region was used as a signature pattern. 

[0844] Consensus pattern l.-|IVj A-H-fSfAC HJ- r jS TVj-[R Tj Y-Jl.lv MJ-G (H binds the heme| 

[ ij Mames M D FA3EB J 2 2557-233«( 193e» 
so [ 23 Bannaga M Sconce 259 30^-309^19^3 1 

[ 33 Rkh.jud C , .i'sb'tlon G Pioc Natl At ad Sci USA 94 11 ; 56 11741t I997i 
[43 Schmitt M P J, Bacteriol. 179:838-845(1997). 

[0845] 286 Hepatitis core antigen. 
55 [0846] The (.or- 1 antigen of hepatitis viruses possesses 1 eatboxyl terminus rich in arginine On this basis it was 
ptedicted that the core antigen ^ould bind DNA[13 Theie is some evpenmental e\idence to support this [2] 
[0847] ft] Past-k M Goto T Gilbert yv 7ink B bchall^r H Mtka\ P t ead better G Munay ix Natiif<r 1979;282: 
^^6-579 [23 Gallina A Bor.elit F Zentilrn L Rindi G Murlmt M Milanesi G J Virol 1989 60 4645-4^52 
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[0848] 287. Histidine biosynthesis protein 

[0S49] Proteins involved in steps 4 and 6 of the histidine biosynthesis pathway ate contained in this family Histidine 
ts formed by several compter and distinct biochemical reactions catalysed by eight enzymes.. The enzymes in this 
Pfam entry are called His6 and His7 in eukaryotes and His A and HisF in prokaryates 
s [0850] [1] Fani R, Tamburim E, Mori £, Lazcano A t Lio P. Barberio C. Casalone E. Cavalieri D. Pento B. Poisinelli 
M, Gene 199?; 197:9-1? ]2] Fani R. Lio P. Chiarelli !, Bazzicalupo M. J Mo! Evol 1994;38. 489-495 
[0851] 288. Histone deacetylase family 

[0852] Histones can be reversibiy acetylated on several lysine residues. Regulation of transcription is caused in part 
by this mechanism Histone deacetyiases catalyse the removal of the acetyl group Histone deacetyiases are related 
to to other proteins [1]. 

[0853] Leipe DD. Landsman D. Nucleic Acids Res 1997:25 3693-369? 
[0854] 289 Histidinol dehydrogenase signature 

Histidinol dehydrogenase (tv.C ;MA23) (HDH) catalyzes the terminal step in the biosynthesis of histidine in bacteria, 
fungi, and plants, the four-electron oxidation of L-histidmol to histidine. In bacteria HDH is a single chain polypeptide; 

*s in fungi it is the C-terminai domain of a multifunctional enzyme which catalyzes three different steps of histidine bio- 
synthesis: and in plants it is expressed as nuclear encoded protein precursor which is exported to the chloropiast [1 j. 
As a signature pattern a highly conserved region located in the centra! part of HDH was selected. This region does not 
correspond to the part of the enzyme that, in most, but not aii HDH sequences contains a cysteine residue which, in 
Salmonella typhtmurium, has been said [2] to be important for the catalytic activity of the enzyme 

20 [0855] Consensus pattern' i-D-xf2!-A-G-P-[ST]-E-[LIVS]-[LIVWIA3(3)-[AC3-<{3)-A->c(4Hi-iVMHAVHSACL]-[DE}- 
[LIVMFCHLfVMHSA]-x(2)-E-H- 

[ 13 Nagai A., Watd E . Beck J., Tada S , Chang J -Y.. Scheidegger A.. Ryals J. Proc Natl. Acad Set U.S. A 88 
4133-4137(1991). 

25 [ 23 Grubmeyer C.T., Gray W R. Biochemistry 25:4778-4784(1988). 

[0856] 290 Homosenne dehydrogenase signature 

Homosenne dehydrogenase (EC 1.1 ; .1 ; 3) (HDh) [1,2] catalyzes NAD-dependent reduction of aspartate beta-semial- 
dehyde into homosenne This reaction is the third step in a pathway leading from aspartate to homoserine The fatter 

30 participates in the biosynthesis of threonine and then isoieucrne as we!i as in that of methionine. HDh is found either 
as a single chain protein as in some bacteria and yeast, or as a Afunctional enzyme consisting of an N-terminal as- 
partokinase domain and a C-terminai HDh domain as in bacteria such as Escherichia coii and in plants. As a signature 
pattern, the best conserved region of Hdh has been selected. This is a segment of 23 to 24 residues located in the 
central section and that contains two conserved aspartate residues. 

35 [0857] Consensus pattern: A-x{3)-G-{LiVMFYHSTAG]-xf2,3HDNS]-P-x{2)-D-[LIVM]-x-G- x-D-x(3)-K- 

[ I] Thomas D . Barbey R . Surdm-Kerjan Y. FEBS Lett 323 289-293(1993) 
[23 Garni B. t Clepet C, Patie j.-C. Biochimie 75:467-495(1993). 

40 [0858] 291 haloacid dehalogenase-like hydrolase 

[0859] This family is structurally different from the alpha/ beta hydrolase family fsbhydrolase) This family includes 
L-2-haloacid dehalogenase. epoxide hydrolases and phosphatases The structure of the family consists of two do- 
mains. One is an inserted four helix bundle, which is the least we!! conserved region of the alignment, between residues 
16 and 96 of Swiss P24089. The rest of the foid is composed of the core alpha/beta domain [1 j Htsanc T, Hata Y, Fujii 

4S T, Liu JQ. Kurihara Tlsaki N. Soda K, J Biol Chem 1996, 271:20322-20330. 

[0860] 292 DEAD and DEAH box families ATP-dependent helicases signatures (helicase_C) 
A number ot'eukaryotic and prokaryotic proteins have been characterised [1,2.33 on the basis of their structural simi- 
larity. They all seem to be involved in ATP-dependent. nucleic-acid unwinding. Proteins currently known to belong to 
this family are' ~ Initiation factor elF-4A Found in eukaryotes, this protein is a subumt of a high molecular weight 

so complex involved in 5'cap recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase, 
■•PRP&and PRP28 These yeast proteins are involved in various ATP -requiring steps of the pre-mRNAspltcing process 
- PI1Q, a mouse protein expressed specifically during spermatogenesis. - An3. a Xenopus putative RNA helicase. 
closely related to PUG. - SPP81/DED1 and DBP1, two yeast proteins probably involved in pre-mRNA splicing and 
related to PI10 -Caenorhabditis eiegans helicase gih-1 -MSS116, a yeast protein required for mitochondrial splicing. 

55 - SPB4. a yeast protein involved in the maturation of 25S ribosomal RNA. - p68. a human nuclear antigen p68 has 
ATPase and DNA-helicase activities in vitro, it is involved in cell growth and division, - Rm62 £pS2). a Drosophila 
putative RNA helicase related to p68 - DBP2. a yeast protein related to p68. - DHH \ , a yeast pfotein - DRS1 . a yeast 
piotein involved in nbosome assembly. - MAK5, a yeast protein involved in maintenance of dsRNA killer piasmid - 
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ROK1. a yeast protein -ste!3. a fission yeast protein - Vasa. a Drosophiia protein important foi oocyte formation and 
specification of embryonic posterior structures. - Me31B. a Drosophiia maternally expressed protein of unknown func- 
tion. - dbpA, an Escherichia coll putative RNA helicase. - deaD, an Escherichia coti putative RNA helicase which can 
suppiess a mutation in the rpsB gene for ubosomal protein S2 - thlB an Escherichia coli putative RNA helicase - 

s rhlE. an Escherichia coii putative RNA helicase -srmB, an Escherichia coii protein that shows RNA-dependent ATPase 
activity It probably interacts: with 23$ nhosomal RNA ■ Caenorhabditis elegans hypothetical proteins T26G10 1, 
Zh512 2 and ZK686 2 - -feast hypothetical protein YHR065c - Yeast hypothetical protein YHR169w - Fission yeast 
hypothetical protein SpAC31A2 07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a number of 
conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP- binding 

to proteins or by proteins belonging to the helicases 'superfamily' [4 E1] One oMhese motifs called the 'D-E-A-D-bo', 
represents a special version of the B motif of ATP-btndino proteins Some other proteins belong to a subfamily which 
have His instead of the second Asp and are thus said to be 'D-E-A-H-bo>-' proteins [3,5,6,Ejj Proteins cunently known 
to belong to this subfamily are. ■ PRP2. PRP !6 PRP22 and PRP43. These yeast proteins are all involved in various 
ATP-requirmg steps of the pie-mRNA splicing piocess - Fission yeast pihl . which my be involved in pre-mRNA splicing 

*5 - Male-less (mlei. a Drosophiia protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD 3 from yeast. RAD3 is a DNA helicase involved in excision repair of DNA damaged by UV light, bulky adducts or 
cross-linking agents Fission yeast rad1? (ihp3) and mammalian DNA excision repan protein XPD {tv.RCC-2) are the 
homologs of RAD3 - Yeast CHL1 <or CTF11 which is important foi chiomosome transmission and normal cell cycle 
progression in G>(2)/M ■■ Yeast TPS1 ■ Yeast hypothetical protein YKL078W ■ Caenorhabdttis elegans hypothetical 

so proteins C06E1 10 and K03H1 2 - Po> viruses' early transcription factor 70 Kd suburnt which acts with RNA polymerase 
to initiate tianscnption from eaiiy gene ptomoters - 18. a putative vaccinia viius helicase - hrpA. an Escherichia colt 
putative RNA helicase. Signature patterns were developed for both subfamilies, 
[0661] Consensus pattern [LIVMF3i2)-D-E-A-D-[RKEN]-x-[LlVMF/GSTN]- 
Consensus pattern: [GSAH]-x-fL!VMF3('3)-D-E.[AL!V]-H~[NECR] ~ 

25 Mote; proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see the relevant 
entry <PDOC0Q017 

[ 1j Schmid S R . Under P Mo! Microbiol 6 263-292(1992) 

[2] Under P., laskoP. AshbumerM., Leroy P, Nielsen PJ., Nishi K., Schnierd., Slonimski P.P. Nature 337:121-122 
30 (1989). 

[ 3] Wassaiman D A Steitz J A Nature 349 463-464i 1991 ) 

[43 Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 53 Haiosh I Deschavanne P Nucleic Acids Res 19 6331-6331(1991 ) 

[63 Koonin E V . Senkwich T G J. Gen Virol. 73 989-&93(1Wtfi. 

35 

[0862] 293. Heme-binding domain in cytochrome b5 and o-ddoreductases (heme.1 1 

[0863] Cytochrome h5 is a membrane-bound nemo protein which ads as an electron carrier for several membrane- 
bound oxygenases [1] There are two homologous fotms of b5. one found in microsomes and one found in the outer 
membrane of mitochondria Two conserved histidine residues serve as axial iigands foi the heme group The structure 
40 of a number of oyidoreductases consists of the juxtaposition of a heme-binding domain homologous to that of b5 and 
eiiher a ilavodehydrogenase or a rnoiybdopfetin domain These enzymes are 

Lactate dehydrogenase (CO 1.1.2.3) [2J. an enzyme that consists of a flavodehydrogenase domain and a heme- 
binding domain called cytochrome b'l 
45 - Nitrate reductase 1 EC 1.6 6 1), 3 |<&y enzyme involved in the first step of nitrate assimilation in plants, fungi and 
bactena [3.4] Consists of a molybdopterm domain tsee '-PDOCQQ484>'t. a heme-binding domain called cyto- 
chrome b557, as well as a cytochrome reductase domain 

Sulfite o<idase (EC j__8 3 J j [?], which catalyzes the tenmnal leaction in the oxidative degradation of sulfur-con- 
taining ammo acids Also consists of a molybdopterin domain and a heme-binding domain. 

This familv of proteins also includes: 

TU-36B, a Drosophiia muscle protein of unknown function [6]. 
Fission yeast hypothetical protein &pAC'!F12 10c 
ss - Yeast hypothetical protein YMR073C 
Yeast hypothetical protein vMR272c 

[0884] A segment was used which includes the first of the two histidine heme Iigands, as a signature pattern for the 
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heme-binding domain of cytochrome b5 family. 

[0665] Consensus pattern [Fr]-[LIVMK]-x;2l-H-P-[GA]-G [H i> a htme axial linand]- 

[1| Ccols J Biochim Biophys A;ta 997 12 (-130(1989) 
s [2] Gulard B. EMBG J. 4:3265-3272(1985). 

[3] Calza R.. Huttner £.. Vincenfz M.. Rouze P.. Galangau F., Vaucheret H.. Cherel f . Meyer C. Kronenherger J.. 
Caboch^M Moi Gen Gen.M 203 552-5^2(1987) 

[4] Crawford N.M.. Smith M.. Beliissimo D„ Davis R.W. Proa Nat!. Acad, Sci. U.S.A. 85:5008-5010(1988). 
[&3GutardB Ledt-rei F Fur J Btorhem 1u0 44 t-45'^fl 9? ^» 
10 [^3 Levin RJ BoythuKPL CfongeKM hazzaz J A Rcrek u E Nucleic Acids, Res 1 V 6349-636 v ( 1989) 

[08663 29-1 He^ap^ptide-i^peat containinn-trantafetases signatuie 

On the basis of sequence- similanty a number of tiansfer.jses have been proposed [1 2 3 4 3 to belong to a single family 
Tnese pmtems ute - Serine acetyitransferase (EC 2 5 1 30' (SAT) (gene cysE; an ere me im ?h ed in cysteine oio- 

»5 synthesis. - Azotobacter chroococcum nitrogen fixation protein mfP. NifP is most probably a SAT involved in the opti- 
mization of nttrogenase activity. - Escherichia coil thiogaiactoside acetyltransterase (EC 2.3. 1 . IS) {gene iacA). an en- 
zyme involved in the biosynthesis of Lactone ■ UDP-N-acetylglucosamine .jcylttansfetase (EC 2 3 1 !2P) (.gene IpxAs 
an enzyme invoiced in the biosynthesis of lipid A a phosphoiylated glycoltpid tnat anchots the lipopoiysacchande to 
the outer membrane of the ceil. - UDP-3~0-[3-hydroxymvnstoy!3 glucosamine N-aey transferase (EC 2.3.1.-) (gene 

so ip>D or fir A) which is also involved in tht biosynthesis of lipid A - c hloramphontcol ^oetyitKinsfeiase (<" ATt (Eu 
2"i 1 £8) from Agiofc actenum tumnfaciwis Bauiius sphaenctis Es:herichia colt plasmid IncFI! NF79 Pseuaorrn'nas 
aeruginosa Staphylococcus aureus plasrnid piP030 These CAT are not evolutional y i elated to the mam family ot CAT 
(bee < PDCC0009o *t - Rhizobium nodulation protein nodL NodL is an ac^tyltransfensfe involved in th<= O-acefvlation 
ol Nod terrors - Bjcte-n i! nuttos-* O-actiyitransfera^ (EC 2 1 1 "95 - Baotenal tetrahvdtodipifolinat.* N-sufcmyi- 

25 transferase tEC 2.3. 1 . 117) (gene dapO) which catalyzes the fourth step in the biosynthesis of diaminopimelate and 
lysine fiom aspartate semialdehyde ■ Baotenal N-acetyiglucosamine-l-phosphate uiidyitiartsfeiase fEC 2 i i 23i 
loeneglmU or gcaC or tins}, an enzyme im'Ol^ed in peptide can rind !ipo}.ol\sacchaiide- biosynthesis - Staph) le oe - 
cetis juruiis protein c ipG v\hich is involved in biosynthesis oi type 1 cjpsulat polysaccharide - ^ast hypothetical 
ptotetn iJL2i6* which is highly similai to Eschencnia coli iacA ~ Fission yeast Hypothetical orotein SpACi SB ii 09c 

30 - Methanococous jannasehii hypothetical piutem MJ1064 1 hoso \ tctcins have fcoen ".hewn [3 4j io contain a tept;af 
stnurhjie romposed of fjndem lepeats of <j [Lt\ ]-G->(4) he>apectide svhith in the tertiaty strtictuie of lp>A [Sj has 
been shown to form a left-handed parallel beta helty Our signature pattern is basea on a fourfold repeat of this he^a- 
peptide. 

[08673 Consensus pattern [L|\ |-jGA[-D] n:>[&rA\ )-v(l.lv]-n 3i [L JVAC j-<-flJV3 j G AE D j- <( 2 )■ [S'f AVR) <{LIV|- 
35 [GAED]-x(2).[STAV>x-{LTV]~ xf3)-[LlV|- 

[ 1] Downie J.A. Mo!. Microbiol. 3: 1649-1 651 H 989). 
[2j Parent R.. Rov PH. J. Baeterioi. 174:2891-2897(1992). 
[ 33 Vaara M FEMS Microbiol Lett 97 24y-I54( 1 4^2 > 
40 [A] \ uono P Haei konen T Tolvanen M V^ara M FEBS Lett 337 289-2^2< 1 094 \ 

[ b\ Ka« l:Ck H Rc dei ick S L Soi^ik e 2 , 0 99 , -1 000{ 1 Wb) 

[0868] :m He^ohnases signature Hexokmase (f:C 2 " 1 1 \ [ l 2j is an important glytolytic enzyme that catalyzes 
the phosphoi\l.jfion of keto- and .tldohev>s es (eg glucose nrwnnoie and fructose) using MgA'IP js the phoiphoivi 

45 donor In vertf-brales thf-ie ate four rn ijor i^osinzvmc-s rummonlv Etjff-ired is typ^s, II! Ill jnd IV Typ^ i\ h^ohnase 
v>ntcn is often incoirectly designated glucokinase [o] is oni\ evpiessed in li\erand pancreatic beta-cells and plays an 
important role in modulating ini-uim i-ecretion it is a protein of a molecular masi. of about Kd Hevokmases of types 
I to 111 svhith h..u'^ Icm Km values foi ghtcose hove a m:>le<.uL.it mass of about 100 Kd Strttctuiallv tney consist of a 
very small N-termmal hjdropnobic memorane-bindtng domain folioweo by t^o haghij similar domains of 4E0 residues 

so The first domain has lost its catalytic activity and has evoh ed into a i^gulatonj domain In yeast thei e aie thi^e differ <=nt 
isozymes hexokin.ts<r PI tgene Ha!\1) Pll(gen<r H> Kt? ), and glucokmase t gene GLh I) Ail thiee piotems have .t 
iTTjIetuLit mass of about 50 Kd All these enzvmes tontjin ?ne (or tv,o in the :aseottyp^s i to ill isozymes 'strongly 
conserved region which has been shown [4] to be involved in substrate binding. A pattern from that region has been 
derived 

SS [08693 Consensus pattern [LiVM3-G-F-[TN]-F-S-[FV ]-P-A(5)-[Lh,'M]-[Dtv!'iT]->'(THi-iVM3-Y(2'-v\'-T-h - A -[LF3- 

[0870] [ 1]MiddietonRJ Biochem Soc Trans 18 180-i 805 1 990) [ 2] Griffin L D Gelb B D V\heelerDA Davison 
r Adams V MrC^beF R Genomic-* n 1014-1024(19" 1 ) [ 3]Comish-Bo^'den A I urCaidenat, M Trends Pr iiem 
vi !6 281-^82(1991} [43 Schtroh D M Wilson J E Aioh Biochem Biophys 254 385-396f 1987 1 
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[0871] 29G. Histone H2A signature (hisl > 

Histone H2A is one of the four histories, along with H2B, H3 and H4, which forms the eukaryotic nucleosome core. 
Using alignments of histone H2Asequences [1.2.E1 j as a signature pattern, a conserved region in the N-termmal part 
of H2.A. This region is conserved both in classical S-phase regulated H2A's and in variant histone H2A's which are 
s synthesized throughout the eel! cycle. 

[0872] Consensus pattenr jAC}-G-L.-x-F-P-V- 

[ 1] Wells D.E.. Brown D Nucleic Acids Res. 19 2173-2188(1991). 

[ 2j Thatcher T.H., Gorovsky M.A. Nucleic Acids- Res 22i M- 179(1994). 

10 

[0873] Historic H4 signature (his2) 

[0874] Histone H4 is one of the four histories, along with H2A. H2B and H3. which forms the eukaryotic nucleosome 
core. Along with H3, rt plays a central role in nucleosome formation. The sequence of histone H4 has remained almost 
invariant in more then 2 billion years of evolution (1 .El ]. The region used as a signature pattern is a pentapeptide found 
*s in positions 14 to 18 of all H4sequences. It contains a lysine residue which is often acetylated [2] and a histidine residue 
which is implicated in DNA-binding [3]. 
[0875] Consensus pattern: G-A-K-R-B- 

[ 1 j Thatcher T.H , Gorovsky M.A. Nucleic Acids Res. 22: 174-1 79(1 994} 
20 [ 2] Doenecke D . Gallwte D Mo! Cell. Biochem 44:113-128(1982). 

[ 3] Ebralidse K K . Grachev & A., Mirzabekov A D Nature 331 365-367(1988). 

[0876] Histone H3 signatures (his3) 

Histone H3 is one of the four histones. along with H2A, H2B and H4, which forms the eukaryotic nucleosome core. It 
25 is a highly conserved protein of 135 amino acid residues [1,2,E1j.Tbe following proteins have been found to contain 
a € -terminal H3-like domain: - Mammalian centromerlc protein OENP-A [3]. Could act as a core histone necessary tor 
the assembly of centromeres. - /eas! chromatin-assoeiated protein CSE4 [4] - Caenorhabditis elegans chromosome 
111 encodes two highly related proteins (F54CS 2 and F58A4.3) whose C -terminal section is evolutionary related to the 
last 100 residues of H3 The function of these proteins is not yet known Two signature patterns were developed. The 
30 first one corresponds to a perfectly conserved heptapepttde in the N-termina! part of H3 The second one is denved 
from a conserved region m the central section of H3. 
[08773 Consensus pattern: K-A-P-R-K-Q-l- 
Consensus pattern: P-F-x-[RA]-L-rVA3-[KRQ]-[DEG3-[IV3- 

35 [ 1 ] Wells D E. . Brown D. Nucleic Acids Res. 1 9:21 73-21 88i 1 99 1 ) 

[ 2j Thatcher T.H., Gorovsky M.A. Nucleic Acids Res 22i /4- 179(1994). 

[ 3] Sullivan K.F . Hechenberger M., tVlasri K J Cell Biol 127 581-592(1994} 

[4) Stoler S., Keith K.C., Ourniek K.E., Fftzgeraid-Hayes M. Genes Dev. 9:573-586(1995). 

40 [0878] Histone H2B signature (his4i 

[0879] Histone H2B is one of the four histones. along with H2A. H3 and H4. which forms the eukaryotic nucleosome 
core. Using alignments of histone H2Bsequences (1.2.E1), a conserved region was selected in the C -terminal part 
0fH26. 

[0880] Consensus pattern: tXR3-E--jLPVM]-l&Cl|-T-^(2)-tT<LR]->-itJ\^l{2)->«-lPAG]-lE3E-:j-L- x-{KRj-H-A-|LIVMj-(STAi- 

4S E-6- 

[ 1 j Weils D E.. Brown D. Nucleic Acids Res. 19:21 73-2188(1991) 

[ 23 Thatcher TH . Gorovsky M A, Nucleic Acids Res 22 174-179(1994) 

so [0881] 297. 'Homeoboy' domain signature and profile (home! ) 

The 'homecbox' is a protein domain of 60 ammo acids (1 to 5,E: 1 j first identified in a number of Drosophtla homeotic. 
and segmentation proteins It has since been found to be extremely well conserved in many other animals, including 
vertebrates. This domain binds DMA through a belix-tum-helix type of structure. Some of the proteins which contain a 
homeobox domain play an important role in development Most of these proteins are known to be sequence specific 

55 D!\!A-bi riding transcription factors The homeobo)' domain has also been found to be very similar to a region of the 
yeast mating type proteins. These are sequence-specific DNA-bmding proteins that act as master switches m yeast 
differentiation bv controlling gene expfession m a cell type-specific fashion A schematic representation of the home- 
obox domain is shown below. The helix-turn-helix region is shown by the symbols 'H' (for helix), and T (for turn). 
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xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx \ \\ \\ \ 

1 10 20 30 40 50 GO The pattern to detect homeobox sequences that was developed is 24 residues long and spans 
s positions 34 to 57 of the homeobox domain. 

[0882] Consensus pattern: |LfVMFY&T|ASl.VRH^HUVM^ 
STNKH3.W~[FYVCMNDQTAH]-x{5). [RKNAiMWj - 

[ 1] Gehring WJ. (In) Guidebook to the homebox genes, Duboule D., 0-d.. ppl-10, Oxford University Press, Oxford, 
10 (1994). 

[2] BuerglinT.R, (In) Guidebook to the homebox genes, Duboule O., Ed., pp25-?2. Oxford University Press. Oxford, 
(1994). 

[3j Gehring W.J. Trends Biochem, Set 17:277-280(1992). 
[4] Gehring WJ., Hironrti Y. Annu. Rev. Genet. 20:147-173(1986). 
»5 [ 5] Schofieid P.N. Trends Neurosci. 10:3-6(1987). 

[0883] 'H<. meobo*' rtntennapedi.j-t\ pe piotein i io,n.jtu!e (homeC) 

The homeotic Hoy proteins ate sequence-specific transctiption factors They are part of a de\elopmentai regulator) 
system that provides cells wrth specific positional identities on the anterior-posterior (A-P) axis [13. The hox proteins 

J- coniam : i 'homtoho' dentin in Dtosof hila and otht-i insects there c3r> : ' eight different Hon qern»& thai c3r> : ' e'ticode'd 
m tv,o gene completes ANT-C and &> -C In ^rtebtates there are 33 genes ^igamzed in four completes in six of the 
eight Drosophila Hov genes the homeobox domain is highly similar and a consei vea he^apeptide is found fi\ e to suteen 
amine audb upstream of the honvobo* domain The sixDiosoihila iioteinsthat belong to this gioupai* 1 wWnnapedi;' 
(Antp) ;sbdomrn iM ( ihd-A) d^form^d tDfd) ptoho^oip-Kiia tph> bi j M\imhi, r«duc> j d tsa 5 and uftt ihrlhora^ (ub*) and 

-"5 are collectively kno^n as the 'antennapedia' subfamily In vertebrates tne corresponding Ho*" genes are knewn [23 as 
Ho*-a2 A3 a4A^ At- A 7 Ho A -Bl b\, 033 &4, lib r?6 B" r?8 Ho<-C4, Co Or- Cd Hov L>1 D3 D4 and 
D8 i. aenorhahditis. elegans lm-39 and mah-5 ate alse members of the 'antennapedia' subfamil\ A<i a signature patiwn 
for the subfamily of hutwobo*: piuttins the con^eK'e d hex (peptide vi is us> j d 
[0884] Consensus pattern [L!vMFEHFY]-P-W-M-[KRQTAJ- 

[ 13 McGinnis W.. Krumiauf R, Ceil 88:283-302(1992). 
[23 Scott M.P. Cell 71:551-553(1992). 

[0885] 'Homeoho' engrailed type protein siynatuie t home 3} 

3* [0886] Must protein*. <vhirh run lain i 'homtjobo/domfin e;sn be rl;<SMfie d [1 2j on thu oa^ts of thuir s^que ncv ohar- 
actenstics, in three subfamilies engrailed, antennapedia and paired. Proteins currently known to belong to the engrailed 
subfamily are - Drosophila t^gnvn ration poLmty piotein ^fKikiile d fen i which <.pe oifies the bodv se-gme nt.at.ion l attem 
and is required for the development of the central nervous system. - Drosophila mvected protein (inv). - Silk moth 
proteins engrailed ana imected wntcn may be invoked in the comrjartmentalrzation of the silk glano - Honeybee EcO 

40 and E60. - Grasshopper tSchistocerca amencana) G-En. - Mammalian and birds En-1 and En-2. - Zebrafish Eng-1 .- 

2 and -3 -Se-a uidun ( fnpntiU&U Ah grahlla) SU-HB-en - Lwch (Helobdeila ti totalis} Ht-hri - Caenorhahditis eltgans 
ceh-16. Engrailed homeobox proteins are characterised by the presence of a conserved region of some 20 ammo-acid 
lestdues located at the C terminal of the 'homeobox domain As a signature pattern for this subfamily of piotems a 
stretch of eight perfectly conserveo residue* in this legion was useo 

■>s [0887] Cun^nsus pattern L-M-A-[EQ]-G-L-Y-f\!- 

[ 13 Scott MP., Tamkun J W„ Hartreil G.W. Ill Biochim Blophys. Acta 989:25-48(1989), 
[ 23 Gehnng W J S:ience 236 124?- 1252(198-) 

j<J [0S88] 29S Isoittatn lyase binnatuie ^ICL} 

Isocitiate Iva^e (E:C 4 1 3 1 1 j1 2j rs an en.Tvme that c.jt.jiy.Tes the convetsicn of iscottrjte to succinate and gl\,ox\l.jte 
This is the fust step in the giyonlate bypass ..in alternative to the tii^jtbov/lic acid :\;\^ in bactenu fungi and plants 
A c> steme a histidme and a glutainate or aspartate have been found to be important for the enzyme's catalytic activity 
Onl> c nt- c ysteine h : 'Sidu«.- e confer v« d bi : 'tvv«;i : 'n tht stiqutne^t. of trn» funqal plant and bat t^nal i^nzym^s it is locattid 

55 tn the inidole of a run^ivud h> j ^3 peptide th?<t can bf used as a signature pattern fui this typr oi tn^yine 
[0889] Consensus pattern K-[KR]-C-G-H-[LMQ] {C is a outatixe actue site lesioue]- 

[ 13 Beeching J.R. Protein Seq. Data Anal. 2:463-466(1989). 



133 



EP 1 033 405 A2 



[2] Atomt H Ueda M Htfcida W Hishida T Teramshi -i Tanaka A J Biochem 107 2i/:-2660iwOJ 
[0890] 299. Initiation factor 2 subumf 

[0891] This family includes initiation factor 2B alpha fc eta ana ie\tn s,uottnits from eukaiyotes, related ptoteins ftom 
s archaobactena and IF 1 from prokaryotes Initiation factor 2 binds to Met-tRNA OTP and the- small ribosomal subumt 
[0892] (1 [ b \ re ides NC V\oese C R, Pio, N .ttl Acad -Soi U S a i «■■> 72o ■ 5 "30 
[0893] 300 Imti ition factor ^ signature 

initiation factor 3 tiF-3) (gene mfC) [1j is one of the three factors required for the initiation of protein biosynthesis in 
bacteria tr-" - 3 is thought to function asa fideht> faotoi duitngthe assembly of the ternary initiation complex whtciuonstst 

to of the oOS nboscmal subunri the initiated tRNA arid itn» masting* i RNa IF-o binds. 1o the 30 1 -" nbosoin : il subunit if 
is a basn. prututn or 141 to 212 residue*. The- fhlotopiast initiation factor IF-Vhh is a prututn that unhanc> j s the pah 
(A.U G)-dependent binding of the initiator tRNA to chioroplast ribosomafSGs subuntts. In its mature form it is a protein 
of about 400 residues whose central section is evolutk naty telated to the sequence <. f b^ctt-nal IF -5 [23 Ai n siqnatute 
pattern a highly i orison ed region ^as selected located in the centra! section of catena I IF-3 and of !F-?i :hh 

is [0894] Consensus pattern: [KRHLIVM)(2HDNH rC YHGSNHKRj-[LIVMFYS}-x4FYHDEQTH}-x(2HKRQj- 

[ I] Livens D Schwartz U Gee it man R , Schwartz I H- MS Mkrobiol Lett It2 211-210(15»'^S 

[2] Lin Q.. Ma L, Burkhart W.. Spremulli L.L J. Biol. Chem. 269:9436-9444(1994). 

20 [0895] 301 lmidazoleglyierol-f.hosphate dthydpitas^ signature (IGPDi 

!midazolegh ( ^to!-pnosph..it^ denvdtatase (EC 4 2 1 191 is the enzyme that ^.italyzes the swwith step in the biosyn- 
thesis ot htstidine in bacteria fungi and plants In most organisms it is a myofunctional piotein of about 22 to2y hd 
In somf baiters suih as Esihwichia coli it is the C-teimin^l domain of a Litunctn nal piot^in that mdude a hibtidinol- 
phosph it ise domain [1] Two sign ilurt? patterns w-Teduvuiopud that each include tv o c onseei itivu histidme residues 

25 [08963 Consensus pattern: [LIVMY3-EDE3-x-H-H-x(2)-E-x{2HGCAHUVMHSTACHt.lVM3- 
Oonsensus pattern G-\-|DN|-\-H-H-xt2)-r2 [SfAGCj-v jP> j-K ■ 

[0897] [ 1] Cariomagno M S «. hiancltii. Afifanc r Nappe AG Brum C B J Mc I Biol /Q3 f.85-S0bt 1988) 
[0898] 302 lndole-3-gKeetui phosphate synthase signature (IGPS) 

lndole-3-glv'ceiol phosohate synthase i EC 4 i i 46i<IGPS) catalyzes the fourth step in the biosynthesis of tryptophan 
30 the ring ilosur^ of 1-(z-carbo*y-phenvlaininoi-1-deovnl:uloseinto indol-o-glvc^fcl-phosptiatf 1 in scmebactt;iia IGP^ 
is a single main enzvme In othets - such us Eschenchia jHi - it is the FWermmal oorrtam ot a bifunrtional wizyrm- 
that also catalyzes N-(5'-phosphonbosy!)anthranilate isomerase (PRA!) activity, the third step of tryptophan biosynthe- 
sis. In fungi. IGPS is the central domain of a Afunctional enzyme that also contains a PRAI C -terminal domain and a 
giutamine amidotransferase N-terminal domain. The N-terminal section of IGPS contains a highly conserved region 
35 which X-ray crystallography studies [1] have shown to be part of the active site cavity. This region was used as a 
signature pattern for IGPS. 

[0899] Consensus pattern: [LIVMFY]-[L!VMC]-x-E-[LIVMFYC]-K-[KRSP3-[STAK]-S-P-[ST}->(3)-[L!VMFYST]- 
[0900] [ 1] Wilmanns WL. Pnestle J P . Niermann T : Jansonius J N J Mot Biol 223:477-507(1992). 
[0901] 303 (IL2? tnterleukin 2 31 members 

40 [0902] 304 (ILVD EDD; Dihydroyy-acid and 6-phosphogluconate dehydratases. Two dehydratases have been 
shown [1] to be evolutionary related - Dihydroxy-acid dehydratase (EC 4 2 19) (gene ilvL> or ILV3) which catalyzes 
the fourth step in the biosynthesis of isoleucine and valine, the dehydratation of 2,3-dihydroxy-isovaleic acid into alpha- 
ketoisovalertc acid. ■ 6-phosphogluconate dehydratase (EC 4 2.1.12) (gene edd} which catalyzes the first step in the 
Entner-Doudoroff pathway, the dehydratation of 6-phospho-D-gluconate into i>-phosphc-2"dehydro-3"deoxy-l>glucO" 

■*s nate. - Escherichia coli hypothetical protein yjhG. Both enzymes are proteins of about 600 amino acid residues. Two 
highly conserved regions have been developed as signature patterns The first pattern is located in the N-terminal part 
and contains a cysteine that could be involved in the binding of a 2f : e-2S iron-sulfur cluster j2j The second pattern is 
located in the C-terminal half. 

[0903] Consensus pattern. C-D-K-x(2)~P-[GA]-x(3HGA3 [The C could be a 2Fe~2S ligandj 
so Consensus pattern: [SA}-L-[LIVM}-T-D-[GA]-R-[LIVMF]-S-[GAHGAV]-[ST3- 

[0904] j 1) Egan S.E., F liege R., Tong S.. Shibata A., Wolf P, E. Jr. Conway T J. Bacterid. 174:4638-4846(1992) 
[ 2] Velasco J. A ; Cansado J . Pena M C . Kawakami T„ Laborda J.. Notano V. Gene 137 1 79-185(1993). 
[09053 305. IMP dehydrogenase / GMP reductase signature 

IMP dehydrogenase (EC 1 1 1 205) (IMPDH) catalyzes the rate-limiting reaction of de novo GTP biosynthesis, the 
£5 NAD-dependent reduction of IMP into XMP [IJ.lnhibition of IMP dehydrogenase activity results in the cessation of DNA 
synthesis. As IMP dehydrogenase is associated with ceil proliferation, it is a possible target for cancer chemotherapy. 
Mammalian and bacterial IMPDHs are tetramers of identical chains There are two IMP dehydrogenase isozymes in 
hurnans [2] GMP reductase (EC 1 6 6 8) catalyzes the irreversible and NADPH-dependent reductive deamination of 
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GIUP into IMP [3] it comerts nucleobase nucleoside ana nucleotide dematK'ts of Gto A nucleotides and maintains 
intiacellulat Laian^of 6 and O nucleotides IMP dehydtonenabeandt-MP ieduct=iSte share many ieyionb otbequ^nce 
simiLttity * ,ri-i of th-i&e. tc jiorib i& ctnt^tc d on a ovhivme residue the ught [3] to be involved in binding IMP This region 
was used as a signature pattern, 
s [0906] Consensus pattern: [LlVM3-{RKHUVM]-C5-[LIVM]-G-x-G-S-[LIVM}-C-x-T [C is the putative IMP-binding rest- 
due]- 

[ 1] Collar* F.R., Huberman E. J. Biol. Chem. 263:15769-15772(19881 

[ 2\ Natsumeda Ohno S Kav.asahH H on no Y, Weber C Suzuki K J bioi Chem 266 bW 629b< 1^) 
« [ Andtews S u oue^t I R Bioch^m J z55 oS^l. I938t 

[0907] C 06 UPP<- 1 Inositol polyphosphate phosphatase famiK catalytic domain 
[0908] [1] \ork JD Pondtt JW Chen ZW M.tthe\vs FS M.,]erus PW 

Bio<.h<-mistrv 1904 53 13164-15171 (2j teffersort A& Auetna^ehat V Pot DA Williams LT Majeius, PW J Biol Chem 
*5 1997:272:5983-5988. [3] Zhang X. Jefferson AB. Auethavekiat V. Majerus PW, Proc Natl Acad Sci U S A 1995:92: 
4853-4856 [4] /orkJD Majerus PW Pioc Natl Acad Sci U 3 A 1990 8" 9548-9552 [5] Neuwald AF >ork.C Maierus 
PW: 

FEBS Lett 1991:294:16-18. 

[0909] 307 I Cj calmodulin-bmding motit 

[1j > te ^ Harnson DH Schiuhtino I Sweet RM Ki!ab<4ri VN S^ent-Gymgyi Cohen C Nature 1994 ^08 
306-312. 

[2] Rhoads AR. Friedberg F: FASEB J 1997:11:331-340. 

25 [0910] 303 Inosine-uricfme preferring nucleoside hydrolasefamiiy signature ilU nuc h\drc) 

Inosine-undine preferring nucleoside hydrolase (EG 3.2.2.1) (IU-nucieosidehydroiase or iUNHj is an enzyme first iden- 
tified in protozoan [1] that catalyzes the hydrolysis of all otfhe commonly occunng purine and pvrtrnidine nucleosides 
miotibose and tht; associate dtos- 1 but has i pte fettnev for inosine ana undine is substrata Thispnzytw is important 
foi these parasitic organisms y.hicharfcaeficientinde no\ osyntnsis of purines to salvage the host put me nucleosides 

30 1UNH Irorn Cnthioia iasnculata has bt^n s^qu^nc ed and charade iz^d il is an homotetpmierK -in2ym-i of tubunitt, 
ot 34 Kd An histidme has been ihosvn to be important for the catalytic mechanism it acts a proton nonor to activate 
the hypoxanthme leaving group. iUNH is evolutionary related to a number of uncharactenzed proteins from various 
biological sour^s notaLK - Escherichia v. oh hypothetical piotein yaaF - Eschen_iua colt hyp*. th^ti^l piotein yh<=K 
■ tisi-heiichia coii hypothetical piotein vei!\ ■ Fission veast hypothetical f totem cSpACI 'G8 02 ■ ^east hypothetical 

35 protein > DR4GOv - An hypothetical protein from the an h^bseteru D^sulfuroiobus ambivaltrts. As a signature pattern 
tot these ptoteins a highh conserved region y^ai. selected located in the N terminal extremity 'ihis region contain*, 
fcur consetvi : 'd aspartates lhat hav^ be^n bhown [z] to be lolled in Hu : ' adivi site caMty 
[0911] Consensus pattern D-y-D-[PT]-[GAH-D-D-[TAV]-[VI]-A - 

40 [ 1] Gopaul D.N.. Meyer S.L.. Oegano M.. Sacchettmi J.C.. Schramm V.L. Biochemistry 35:5963-5970(1996). 

[2jOeganoM Gopaul D N ^o^pin - j dthramm\/L Sacchettmi J t biochemt&tiy ^9/ 1-^9yi< I996t 

[0912] 309. (Insulinasei 
Insulmase family zinc -binding region signature 
45 (gka Peptidase^ 16} 

[0913] A numbfct r>t proteases dependent ^n oi\alent cahrns for then activity ha^e been shovn [1 ^] to belong tc 
one family, on the basis of sequence similarity. These ensvtnes are listed below. 

Insuimase tEC 3 4 24 50 1 (also knosvn as msuiysin or insulin-degrading enzyme or IDE; a cytoplasmic enzyme 
so v,lu_ii betmsto be invol\ed in thf v.ellu!at pioc^^sin^ of insulin ylucagcn and other ^niall pc!\ peptides 

Fschenchia coli patfraie III tt:C 4 ; 4 ■jji) (pitnlvsinKgentr ptt i a petiplasmic enzytne that degiadei small pep- 
tides. 

Mitochondria! processing peptidase (EC 3.4.24.64) <MPP>. This enzyme removes the transit peptide from the pre- 
curbct fonn ot proteins imj or^'d ftom the i.ylopiasm acio'.i, the tnilo>'hondna! inrn^t membt : ine 11 is composed ot 
ss tv.,i nontdenti(.al hontoiogou^ subumis termed ilph i tnd bet i Tht b^la subunti af> j rns to bt (.atalylirally active 

s*,hile the aipha suountt has probably lost its acti\it) 

Nardilysin (EC 3.4.24.61) (N-argmme dibasic convertase or NRD convertase) this mammalian enzyme cleaves 
peptide &nb6tiate«. on the N-terminu 1 . of Arg tesidu-ii in dibasic shefche 1 . 
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S' lebsielia pneumoniae protein pqoF This protein is required for the biosynthesis of the coenzyme pytroio-quino- 
line-qumone (PQQ). It is thought to be protease that cleaves peptide bonds m a small peptide (gene pqqA) thus 
providing tht cilutamatt- and tyrosine tesidues necessary k t the syrH ru : 'Sis of PQQ 
Yeast protein AXL1. which is involved in axial budding [3], 
s - Eimena bovis sporozoite developmental protein. 

Escherichia colt hypothetical protein yddC and H1 1368, the corresponding Haemophilus influenzae protein . 
Bacillus subtihs hypothetical protein ymxG. 

Caenorhabditis elegans hypothetical proteins C26F5 4 and F5oD2 1 

to [0914] It should b> : ' noted that in addition totht above enzymes this family : ils<: includes. ttu : ' core proteins I and II 
of the mitochondrial bd complex t also called cytochrome c mductase- or complex 111 t but the situation as to the activity 
oi lack of activity of these suL unite ts quite comply 

In mammals and yeast, core proteins I and II lack enzymatic activity: 
*s - in Neurospora crassa and in potato core protein I is equivalent to the beta subunit of MPP. 
In Euglena gracilis, core protein I seems to be active, while subunit II is inactive. 

[0915] These pioteins do not share many regions of sequence similarity the most noticeable is in the N-termmal 
section. This region includes a conserved histidme followed, two residues later by a glutamate and another histidine, 
so In pitnlysin. it has been shown [4] that this M-x-x-E-H mot it is involved in enzvme activity; the two histidines bind zinc 
and the glutamate is n^c^ssaiy for cataMn activity N^n active members of this family ha^e lost tiom one tn thiee of 
these active site tesidues We developed a signature pattern that detect active membeis ot this family as well as some 
inactive members. 

[0916] Consensus pattern ^-G-^lSTAJ-H-Ei-iVMF* HLIVMC3-[DERNl-EHRKL3-[LMF^Tl-v[LFSTH]-^- 

25 (GSTANj-[GST| [The two H are zinc hgands] [£ is the active site residue] Sequences known to belong to this class 
detected by the pattern ALL active members as well as all MPP alpha subunits and core II subunits. Does not detect 
inactive core I subunits. 

[0917] Note thPSji* profe-ins oeiong to family M1o in the classification of peptidases [5] 

30 [ i]ka*lmgs N L' BartfcttAJ Bitxhtm J 2,b 

[Z] Btaun H -P Schmitz U K Tiends Biochem Sci 20 171-1~5< 1995) 

[ 3] Becker A B Roth R A Proc Natl Acao Sci U S A 69 3835-38 39< 1992} 

[ 4j Fujita A.. Oka C Ankawa Y.. Katagai T.. Tonoucht A.. Kuhara S.. Misumi Y. Nature 372:567-570(1994). 
f "jj Pawling? N D Bluett A I Meth E-n^vmol 24« m ?.l 8t W<j&s 

35 

[0918] 310. invoiucrin repeat 

[0919] E^-kert RL Y=iffe MB Crish JF, Murthy -> RorKt- EA Welter .IF, J Imest Dermatol 1991 100 AU-SI7 
[0920] 311 Isochonsmatase family This family are hydrolase enzvmes 

[0921] Romao MJ TutkD Gomis-Ruth FK Huber R bchumachei G Molleung H Russmann L J Moi Biol 1902 
40 228:1111-1130. 

[0922] o 12 Inositol monot. hosphatase iamily signaling oriOsatUtl_P> 

It ha? teen shosvn [ f] that se^ era! pioteins shute two science motifs Two of these proteins are enzymes ot the 
inositol phosphate second messenger signaling pathway ■ Wrtebiate and plant*, inositol monophOi,phatase <£C 
3 1 3 >b) ■ s vPrtebr^te inositol polyphosphate 1 -p hosphatase (k'C 1 '< b v ) The function of the other pioteins is not 

■>s ytt c!.*ar - Bacteria! protein cysQ CysQ could h<Mp to oontio! tht poo! of PAPS (3'-phosphoad.*nosid.* 5'-phosphosu!- 
fate). or be useful in sulfite synthesis. - Escherichia coll protein suhB. Mutations in suhB results in the enhanced syn- 
thesis of heat slick!- Sigma factor thtpRj ■ Neuio^poia nassa protein Qa- V Piobably involved in qutnate metabolism 
- Emwtcella nidulans r. totem qutG Pmbafcly invoked in qumate metabolism - frast piotem HAL2'MET22 [2] involved 
in salt tolerance as well as metntonme biosynthesis - Yeast hypothetical hypothetical protein yHRCMOc - Caenorhab- 

so ditts elegans hypothetical protein F13G3.5. - A Rhizobium legummosarum hypothetical protein encoded upstream of 
the psi qenefot exopol^acc ha tide synthesis ■ Methanococais lannaschii hypothetical protein M kt109 It is suggested 
[!] that these proteins may act by enhancing the synthesis or degradation of phosphorvlated messenoei mHecttles 
From the X-ray structure of human inositol monophosphatase {3]. it seems that some of the conserved residues are 
involved in tmdmg a metal ion and/or Hu : ' phosphate grouf of the sut attain 

ss [0923] Consensus pattern [FWVj-^0 1 )-[LIVM3-D-P-[L!VM]-D-['iGHST]-Y ( 2>-tFV]-\- 
[HS- RNSTY) [The tiist O and the T bind a metal ton]- 

Consensus pattern [v\Y]-D-x-^ 1-[C-<S^-[GSAP\, ]->-[L IVACPj-[l IV]-[t lVAC3-<i3)-[GH]-[GAj- 
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[ i j Neuwaid AF ~t ot k J D Majerus PW FEBc Lett 2*4 10-i6tly01] 

[2] GlaeserH-U.. Thomas D.. Gaxioia R., Montrichard F. , Surdin-Kerjan Y, Serrano R. EMBO J. 12:3105-3110 
(19931 

[ "ij Bone P Spnngw J p At^:k J R Ftoi Natl Acad S:i USA 89 1005!- I00"i5( 1992; 

s 

[0924] 3 1 3 Ion tra ns port protein 

[0925] This family contains Sodium Potassium Calcium ton channel This family is transmembrane helices m which 
the last two helices flank a loop which determines ion selectivity, In some sub-families (e.g. Na channels s the domain 
is repeated tour times, whereas in others (e.g. K channels) the protein forms as a tetramer in the membrane. A bacteria! 
to structure ot the protein is known for the last two helices but is not the Pfam family due to it lacking the first four helices 
[0926] 114 Isoeittate and isoprop>,linaiate duh^dtogena^ signature (isodtn 

Isocitrate defnda p^ena^ ^IDHi[1 2j is =in important enzyme otcarbohydi ate metabolism v\htch catalvc<=sth<= oxidative 
decarboxylation of isocitrate into alpha-ketoqlutaraie. IDH is either dependent on NAO+ (EC 1,1.1.41 ) or on NADP+ 
i EC 1 1 1 42; !n eukaryotes thete aie at least tniee isozymes, ot IDH two ..lie lorateo in the mitochondrial matn^ (on-? 

»5 NAD+ -dependent, the other NAOP+-dependent) ; while the third one (also NADP+-dependent) is cytoplasmic. In Es- 
cherichia coli the activity of a NADP+-dependent form ot the enzyme is controlled bv the phosphorylation of a serine 
lesidue the phospfx tviated fonn of IDH is toinpietelv in .tothMted 3-iioptopylmalrtte dehydtocsen .tse iF.C 1 1 1 85) 
(IMDH) [c 4) catalyzes the third step in the biosynthesis of leucine in bacteria and fungi tne oxidative decatenation 
ot3 isopropvlmalate mto2-o\o-4 methvlvalerate Tartiate dehvdiogt-nase tFC i i i 93» [5] ^ataly?es the reduction of 

J- tarkat-i k ovfsloglyeolate These enzymes att; evolutKtuty related [13 4 5] Tin* best (.onstiv^d region of these en- 
zymes is-jghcine-nch stretch of residues lorateo mtneC-terminalse:tion This teg ion was used as a signaktie f attwn 
[0927] Consensus, pattern [NSHUtVh T]-[FrDNj-G-[DNTj-[lMV rH-[STGDN]-[DN]-^t2Vj;SG^P]-M3 4t-G-[STG|- 
[LIVMPAJ-GTLIVMF]- 

25 [ 13 Hurley J. H., Thorsness RE.. Ramalfngam V.. Helmers N.H.. Koshland D.E. Jr., Stroud R.M. Proc. Natl. Acad. 

Sci. U.S.A. 86:8635-8639(1989). 

f?]Cu(.pJR McAhslet-Henn I. I Bid Chem 266 221«9-222P5( 

[?jlmidaK Sato M Tanai-a N Katsubu Y Matsuura V Oshiina T l Mo! Biol 222 725- _, 38M991 . 
[4j Zhang T l-oshlandDE Ji Protein Set 4 e4-i/:(199oi 
JO [$] fipktnPA Be-ichetBS Ar^h Biochein biophys o13 15-21ff)94i 

[09283 315. Jacalin-iike lectin domain. 

[0929] Proteins (.ont^mmy this domain are latins It i*. found in 1 to _xptes in these proteins The domain is albO 
found in the animal prostatic spermine-bindmg protein <Swiss:P15o01L 
A* [0930] [1] Sankaranarayanan R Stkn K Banetjee R Sharina V Surolia A Vi|ayan M N it Struct Biol 199B ^ 
596-603 

[0931] 316. KH domain 

[0932] KH motifs probably bino RN-i diiectly Auto antioodies to No^ 3 KH oomain protein cause pa 1 3 neoplastic 
opsoclonus ataxia, 

[IjBurdCG L'r^yluss G Sconce 1 <r)4 26s 6 ! h Kl \ 

[2] Musco G. Stier G. Joseph C, Castigiione Morelli MA. Nilges M Gibson TJ. Pastore A. Cell 1996:85:237-245. 
[0933] 317. Ketch motif 

■is [0934] The kelch motif v\as initially discovered in hekh f^'iss Q04C52t In this, protein theru are sk copies of the 
motit It has been shown tnat S^ibS Q04i52 is related to GalactObe Oxidase p J for which a structuie has been soUed 
[2j. The keich motif forms a beta sheet. Several ot these sheets associate to form a beta propeller structure as found 
in neur, 

[0935] [1] Bork P DooltttleRF JMolSioi 1994 226 1277-1222 [2] fto N Phillips3E Stevens C OgelZB McPherson 
oo MJ Ke^n JN >adav l\0 Knov,lesPF Natuie 1^91 350 8 "-90 

[0936] 518 Sovhean ttvpiin inhitik i iKunitn piotease inhibitors family i io.no.tuie 

[0937] The soybean trypsin inhititoi (Kunitz) tamilv [ !] is ^ne of th^ numerous families of pioteinase inhibitors It 
comprise plant proteins which have inhibitory activity against senne proteinases from the trypsin and subtilisin families, 
thiol prot-itnases and &±\ artic \ tc kinase 1 . Ah w-ill as some proteins that : ne probably iru'clvt;d in '.eed storage This 
55 fai nilv is curtentK knos^'ti to group thu lollo'Aitia proteins - Ttvpsin inhibitors A b C KT11 and H T!2 ftom soybean - 
Ttypbin inhibitor DE3 from cotal beans lEtytnnna sp ) -Tiypsin mhioitot D£5 Horn sandal oeadtiee -Tr\psin inhibttois 
1A (WTI-1A), 1B {WT1-1B). and 2 (WTl-2) from goa bean, - Trypsin inhibitor trom Acacia confusa, - Trypsin inhibitor 
from silk tree. - Chvmotrypsm inhibitor 3 (WCI-3) from goa bean. - Cathepsm D inhibitors PDI and NDI trom potato [2], 
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which inhibit both cathepsin D (aspartic proteinase) and trypsin, - Alpha-amyiase/subtiiisin inhibitors from bariey and 
wheat - Albumin- 1 (WBA-1 ) from goa bean seeds [3]. - Miraculin from Richadelia dufcifica [4], a sweet taste protein. 
- Sporamin from sweet potato [5], the major tuberous, root protein - Thiol proteinase inhibitor PCPi 8 3 iP340i from 
potato tuber [6]. - Wound responsive protein gwin3 from poplar tree [7], - 21 Kd seed protein from cocoa [8], All these 
s proteins contain from 170 to 200 amino acid residues and one- or twointrachain disulfide bonds. The best conserved 
region is found in their N-terminal section and is used as a signature pattern 
[0938] Consensus pattern [LIVM]-y-D-y-EEDNTY3-{DG]-[RKHDENQ3-x-[LiVM3-x(5)-Y-y-EL!VM3 - 

E 1) Laskowski M„ Kato I, Annu. Rev. Biochem. 49:593-626(1980). 
10 E 2] Ritonja A., Knzaj I . Mesko P., Kopitar M , Lucovnik P . Sttukelj B . Pungerear J , Buttle D J . Barrett A J . Turk 

V, FEBS Lett. 267,13-15(1990), 

[ 33 Kortt A.A., Strike P.M.. de Jersey J. Eur. J. Biochem. 181:403-408(1989). 

[4j Theerasilp S. f Hitotsuya H, Nakajo S, Nakaja K., Nakamura Y, Kurihara Y. J. Biol. Chem 264:6655-6859 

(1989). 

*5 E 5] Hattori T.. Yoshida N„ Nakamura K, Plant Moi. Biol, 13:583-572(1989}. 

E 8] Knzaj I., Drobnic-Kosorok M„ Brain J., Jerala R., Turk V. FEBS Lett. 333:15-20(1993). 

E 7] Bradshaw H.D., Hollick J.B : Parsons TJ„ Clarke H.R.G.. Gordon MP. Plant Moi. Biol 14:51-59(1989} 

[83Tai H., McHenry L, Fritz P. J.. Furtek D.B. Plant Moi. Bioi. 16:913-915(1991). 

20 [0939] 319 Beta-ketoacyi synthases active site 

Beta-ketoacyi-ACP synthase (KASj [1] is the enzyme that catalyzes the condensation of maionyl-ACP with the growing 
fatty acid chain. It is found as a component of the following enzymatic systems. - Fatty acid synthetase (FAS), which 
catalyzes the formation of long-chain fatty acids from acetyi-CoA, malonyl-CoA and NAD PH. Bacterial and plant chlo- 
ropiast FAS are composed of eight separate subunits which correspond to different enzymatic, activities; beta-ketoacyi 

25 synthase is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2; the beta- 
ketoacyi synthase domain is located in the C-terminal section of FAS? Vertebrate FAS consists of a single multifunc- 
tional chain, the beta-ketoacyi synthase domain is located in the N-terminal section [2] - The multifunctional 6-rneth- 
ysalicylic acid synthase (MSAS) from Penicillium patulum [3]. This is a multifunctional enzyme involved in the biosyn- 
thesis of a polyketide antibiotic and which has a KAS domain in its N-terminal section - Poiyketide antibiotic synthase 

30 enzyme systems Polyketides. are secondary metabolites produced by microorganisms and plants from simple fatty 
acids KAS is one otthe components involved in the biosynthesis of the Streptomyces polyketide antibiotics granatacin 
[4], tetracenomycin C [5] and erythromycin. - Emericella nidulans multifunctional protein Wa. Wa is involved in the 
biosynthesis of conidial green pigment. Wa is protein of 216 Kd that contains a KAS domain. - Rhizobium nodulation 
protein nodE, which probably acts as a beta-ketoacyi synthase m the synthesis of the nodulation Nod factor fatty acyl 

35 chain. - Yeast mitochondrial protein CEM1 The condensation reaction is a two step process: the acy! component of 
an activated acy! primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated 
malonyl donor with the concomitant release of carbon dioxide The sequence around the active site cysteine is well 
conserved and can be used as a signature pattern. 

[0940] Consensus pattern' G-x{4',-ELIVMFAP3-x{2)-EAGC3-C-(STA](2)-(STAG3-xf3Hl-l v 'MF3 EC is the active site resi- 
de due] 



E 13 Kauppinen 8., Siggaard-Andersen M . von Wettstein-Knowies P. Carlsberg Res. Commun. 53 357-370(1988). 

E 2] Witkowski A,, Rangan V.S, Randhawa Z.I., Amy CM,, Smith S. Eur. J. Biochem. 198:571-579(1991 ). 

E 3] Beck J.. Ripka S : Siegner A , Schiltz {■:.., Schweizer E. Fur. J. Biochem. 192-487-498(1990}. 

E4j Bibb M.J.. Biro S„ Motamedi H., Collins J.F., Hutchinson C.R, EMBO J. 8:2727-2738(1989). 

[ 53 Sherman D.H., Malpartida F., Bibb M.J., Kieser H.M., Bibb M.J., Hopwood D.A. EMBO J. 8:2717-2725(1989). 



[0941] 320. Kinesm motor domain signature and profile 

Kinesin [1 .2.3) is a microtubuie-associated force-producing protein that maypiay a role m organelle transport. Kinesm 
so is an oiigomenc complex composedof two heavy chains and two light chains. The kinesin motor activity isdirected 
toward the microtubule's plus end. The heavy chain is composed of three structural domains: a large globular N-terminai 
domain which is responsible for the motor activity of kinesin (it isknown to hydrolyze ATP. to bind and move on micro- 
tubules), a centra! aipna-helicai coiled coil domain that mediates the heavy chain dimerisation; and asmal! globular C- 
terminal domain which interacts with other proteins (such asthe kinesin light chains), vesicles and membranous or- 
55 ganeiies.A number of proteins have been recently found that contain a domain similarto that of the kinesin "motor" 
domain [1 AEJj - Orosophila ciaret segregationa! protein incd) Ned is required for normal chromosomal segregation 
in meiosis. in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the mi- 
crotubule's minus end. - Orosophila kinesin-like protein (nod). Nod is required for the distributive chromosome segre- 



138 



EP 1 033 405 A2 



gatton of nonexchange chromosomes during metosts - Human CENP-E [4]. CENP-E is a protein that associates with 
kinetochores during chromosome congression. relocates to the spindle midzone at anaphase, and is. quantitatively 
discarded at the end of the cell division CENP-E is probably an important motor molecule in chromosome movement 
and/ or spindle elongation. - Human mitotic kinesin-like protein-1 (MKLP-1), a motor protein whose activity is directed 

s toward the microtubule's plus end. - Yeast KAR3 protein, which is essentia! for yeast nuclear fusion during mating. 
KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis ■■ Yeast C!N8 and KIP1 proteins 
which are required for the assembly of the mitotic spindie. Both proteins seem to interact with spindle microtubules to 
produce an outwardly directed force acting upon the poles. - Fission yeast cut? protein, which is essential for spindle 
body duplication during mitotic division. - Emericeila niduians bimC, which plays an important role in nuciear division. 

to - Emericeila niduians klpA - Caenorhabditis elegans unc-104, which may be required for the transport of substances 
needed for neuronal cell differentiation - Caenorhabditis elegans osm-3 - Xenopus Eg5, which may be involved in 
mitosis. - Arabidopsis thaiiana KatA. Kat8 and katC - Chiamydomonas reinhardtii FLA10/KHP1 and KLP1 . Both pro- 
teins seem to play a role in the rotation or twisting of the microtubules of the flagelta. ■ Caenorhabditis elegans hypo- 
thetical protein T09A5 2 The ktnesin motor domain is located in the N-terminai part of most of theabove proteins, with 

*s the exception of KAR3, klpA. and ncd where it is locatedin the C-terminai section. The kinesin motor domain contains 
about 330 amino acids. An ATP-bindtng motifof type A is found near position 80 to 90, the C-terminal half of the domaims 
involved in microtubule-hinding. The signature pattern for that domain isderived from a conserved decapeptide inside 
the microti! bule-binding part. 

Consensus pattern; [GSAHKRHPSTQVM]-[LiVMF]-x-[LiVMFHtVC]~D~L-[AH]-G-[SAN]-E 



[ 1] Bloom G S.. EndowS A Protein Prof 2 1109-1171(1995) 

[2j Vallee R B., ShpetnerH.S Annu Rev. Biochem. 59:909-932(1990). 

[ 3] Brady S.T. Trends Ceil Biol. 5:159-184(1995). 

[4] Endows. A. Trends Biochem. Sci. 18: 221 -225(1 991 ).[ El] 



[0942] 321 Ribosomal protein 1.15 signature 

Ribosornai protein L15 is one of the proteins from the large nbosoma! subunit. in Escherichia coll. L15 is known to bind 
the 23S rRNA. it belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], groups: - 
Eubacterial L1S - Plant chloropiast 115 (nuclear-encoded j. - Archaebacteriai LI 5. - Vertebrate L27a. ~ Tetrahymena 
30 therrnophila L29. - Fungi L27a (L29, CRP-1 . 0/1-12! L1 5 is a protein of 144 to 154 amino-acid residues As a signature 
pattern, a conserved region was selected in the C-termmal section of these proteins 

[08433 Consensus pattern: K^LIVM3{2HC3ASL3-x^GT3-x-[LiV!V!AH2 1 5Hi-!VM^x-{LiVMF3-x(3.4HLiVMFCAHST>-x 

{2)-A-x(3)-[UVM]-x(3)-G 

[09443 ( 1j Otaka E- . Hashimoto T., Mizuta K , Suzuki K. Protein Seq Data Anai. 5:301 -3 13(1 993) 

35 [0945] 322 LBP / BP1 .' CETP family signature 

The following, mammalian iipid-binding seaim glycoproteins belong to the same family [1,2,3]: - lipopoly saccharide- 
binding protein (LBP). LBP binds to the lipid A moiety of bacterial lipopolysacchartdes (LPS), a glycoiipid present in 
the outer membrane of all Gramme active bacteria The LBP/LP3 complex seems to interact with the CD14 receptor 
and may be responsible for the secretion of alpha-TNF. - Bactericidal permeability-increasing protein (BPl j. Like LBP. 

40 BP! binds LPS and has a cytotoxic activity on Gram-negative bacteria. - Gholesteryl ester ttansfer protein (CETP). 
CETP is involved in the transfer of insoluble choiesteryl esters in reverse cholesterol transport - Phospholipid transfer 
protein (PLTP) May play a key role in extracellular phospholipid transport and modulation of HDL particles These 
proteins are structurally related and share many regions of sequencesimilarities. As a signature partem one of these 
regions was selected, which is located in the N-terminai section of these proteins, a region which could be involved in 

45 the binding to the lipids [2]. 

Consensus pattern: [PAHGAHLIVMC]~x(2VR-[IVHST]-x(3R-x(5HEQ3-x(4HL!VMHEQK]-x(8)-P 



[ 1 j Schumann R R . Leong S R , Flaggs G W , Gray P W . Wright S D . Mathison J C . Tobias P S . Ulevitch R J 
Science 249:1429-1431(1990). 

[ 2) Gray P.W., Flaggs G., Leong S R Gumma R.J., Weiss J., Ooi C.E.. Efsbach P. J. Biol. Chem. 264:9505-9509 
(1989). 

[33 Day J.R., Albers J.J., Lofton-Day C.E., Gilbert T.L., Ching A.F.T., Grant FJ., O'Hara P.J,, Marcovina S.M., 
Adoiphson J.L J. Biol. Chem, 269:9388-9391(1994). 



ss [0946] 323 LiM domain signature and profile 

Recently [1,2] a number of proteins have been found to contain a conserved cysteme-nch domain of about 80 amino- 
acid residues. These proteins afe: - Caenorhabditis elegans mec-3, a protein required for the differentiation of the set 
of sk touch receptor neurons in this nematode - Caenorhabditis eiegans lin-11; a protein required for the asymmetric 
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division of vulval olabt cells ~ Vertebrate insulin gene enhancer binding protein tsl-1 lbl-i bindbto one ofthetvo ex- 
acting protein-binding domains of the insulin gene. - Vertebrate homeobox proteins liro-1, liro-2 (liro-5! and lim3. - 
y-irlt;bf : iU lmx-1 which acts as a ti arise fif. tic nal activafoi by binding to the FLAT foment a beta-o«il-i4 eafK tran- 
scrtptionat enhancer found an the insulin qene, - Mammalian LH-2. a transcriptional regulatory protein involved in the 

s control of cetl differentiation in developing lymphoid and neural cetl types Drosophila protein apterous required for 
the notmal development ot the winy and hdtei im.jp,inrti discs ■ vertebrate piotein kinases LIMK-1 and I. IMh > ■ 
Mammalian rhornbotins. Rhombottn 1 (RBTN1 or TTG-1 ) and rhombotin-2 i'R8TN2 or TTG-2) are proteins of about 
16u amino acids v\ hose genes are disrupted by chromosomal translocations in T-cell leukemia - Mammalian and avian 
cysteine -itch protein (CRP), a 192 ammo-acid piotein of unknown function Seem? to i ntei act with ?y\in ■ Mammalian 

to cy&itiine-nch intestinal protein <CRIP} a small protein whn.h i^eins to have a role in zmc absoiption : md in : i> function 
<2h an intracellular zinc ttansport protein - Vfc;rt> j brate paviilin i o\tosl-#-l> i tjl foe i! adhesion protein - Mui^e testm 
Mouse testin should not be contused with rattestm which is a thiol protease homolog. - Sunflower pollen specific protein 
Sf : 3 ■ Chicken -yon Zx <m is a lovwbundatuv adhesion pi .to,ue prttein which has been shovwi k mteiact *ith C PP 
- Yeast piotein IPG! which is im •Jived in sportilation [4] - yeast rho-tvpe GTPase ..unvoting protein RG-^I/DBMI - 

*s Caenorhabditis elegans nomeoboy protein cen-14 -Caenorhabditis elegans homeooox protein unc-97 - Yeast nypo- 
tnetical protein YKROQOw, -Caenorhabditis elegans hypothetical proteins C28H8.6, These proteins generally have two 
t.tndem copies of .j dot ruin called LIM iforl.in-11 h.l- 1 Mec 3) in then N-termmal section £y\in and paxiilin ar^excep 
ttons in that they contains respectively three and four LIM domains attheii C-terminal e\tremit\ In aptetous isi-1 LH- 
2, lm-11 lirrv-1 to hm-3.lmx-l and ceh-14 and mec-3 there is a homeobox domain some 50 to 95 amino acids after 

so tht;LIM demams In tht LIM domain then* : ne swri connived c> ^.t^int; residues and ahistidm^ The anangcmonl 
followed bv th^SH conseiv^T lesnues is C-m2)-C-^16 23)-H-^2i-[CH]-<{2VC-Y(C)-C-yi 16 21)-C-y<2 3i-[CHDj The 
LIM domainhinds two arinc ions [5] LIM does not oind DMA lather it seems to act asinterface for orotein-piotem mtet~ 
action. A pattern was developed that spans the first half of the LIM domain. 

[0947] Concensus pattern C-v(2i-C-At15 21 H^ 1 ' ^hH-v(2KCH3-A t 2i-C-^25-C-\(3)-[LlVMFj [Th-> 5 C's and the H 
25 bind zincj 



[ 1] Frevd G . Kim S.K . Horvitz H.R. Nature 344:878-879(1990). 
[ 2] Baltz R Eviard J -L Domon C Stfinrnelz A Plant Cell 4 14C5-146o(1<*?2) 
[ Jj Sanchez-Garcia I Rabbitts TH Trends Genet 10 315-2201 1994} 
30 [ 4] Mue \\<*t A Xu G Wills, R Hollenbero, C P Piepersb*fq W Nuclek- Acids Re s 22 o1s 1-3 1 ">4( 1<W4) 

[SjMKhHswiJW Sjhmeichel h L Beci-wleMC Winge D P Pto: Natl Acad Sci USA 90 4404-4408 
(1993). 

[0948] 324 U.RR) I. euoine Ric h Repeat 

35 CAUTION This Ffam may nut find all Leucine Rich Repeats in a piotein Leucine Rich Repeats ar>> shoft s^quenc-* 
motifs present in a number of proteins with diverse functions and cellular locations These repeats are usually involved 
in protein-protein inU tactions Each L> : 'Ui. me Rich Repeat is >'om( osed of a b^ta-alpha unit Th^t.^ units form elongated 
non-cilobular itiuctures Leucine Rich Ref -^^ti ar<- often flanked i: s' c yst-^me itch domains Number of m<-mbe is 301? 
[1] The leucine-nch lepeat a versatile binding motif Kobe E Deisenhofei J Ttends Biochem Sci 10t'4 19 415-421 

40 {2} crystal structure of porcine nbonuclease inhibitor, a protein with leucme-nch repeats. KobeB. Deisenhofer J: Nature 
1993:366:751-756. 

[0949] 325 PLmt lie io tiansfer piotein family signatuie i LTP) 

[0950] Plant cells contain piotein?, called lipid transfer piotein? ij.'l P) [1 2 3], which are able to facilitate the tiansfer 
of phospholipids: andothei lipid sacroiS membranes These pioteini. whoi e i uhi.elluUr k cttn n is not vet hiovsn could 
45 plav a major roi> j in nsembrane biogenesis by convening phospholipids such as v, ives of cutm from ifieir site of bio- 
synthesis to membranes unable to form these lipids Plant LTP'b ate proteins of about y Kd (^9 ammo acids) >vhich 
contain eight conserved cysteine residues all involved in disulfide bridges, as shown in the following schematic repre- 
sentation. 

+- - + ! + f jjjj 

xCxxxxCxxxxxxCCxxxxxxxxCxCxxxxxxxxxxxCxxxxxxCxx 1 1|| + —f + | +- 

-+ 

ss 

'C: conserved cysteine invoSved in a disulfide bond, 
'*': position of the pattern. 
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[0951] Consensus pattern [L!VM]-[PAH(r)-C-^-[LiVM3- , >-[LIVM3-v-[Lt\MF , i J-^-JLiVMHSTj-vi 3i-{DfJ]-C-v^j- 
fLIVM] [The feoGsaio invoked in disulfide Lands] 

[1]WirfckWA Annu Pev Biochem 60 "?-90( 1991 ' 
s [2] Arondel V., Kader J.C. Expertentia 46:579-585(1990). 

[3|OhlrogyeJB Bio^eJ St merville 0 R Eioihim Biophyt, Act.t I0W 1 -2St 1P91 s 

[0952] 32o (LAMP) Lyso so me- associated membrane glycoproteins signatures 

I. ysosome -associated membiane glycoproteins, dampt (1 j art- tntepisl membiane pmteins specific to Ksosomes, and 
to whose exact btoloqieal function is not yet clear. Structurally, the lamp proteins consist of two internally homologous 
iysosoinsj-lummal domains separated bv a proiine-nfh hmou region at thu C -terminal extremity there ts a transmmn- 
biane i^yion tollov.ed by a vei\ short cytoplasmic tail In e3v.h of the duplicated domain*, there =rre two conserved 
disulfide Londs This, stiucture is schematically represented in the figuie Lelow 

+ — + + + + + +-- + 1 m 1 1 1 n 

xCxxxxxCxxxxxxxxxxxxCxxxjtxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxrxxxxxxxx 
30 <~ xHingex— ><T M><C> 



In mammals there ate too closely related tynes of iamrj lamp-1 and lamp-;: In chicken lamp- 1 isknoy.n as LEPiOO The 
macrophage protein CD68 (or macrosialm) [2] is a heavily glycosylatedmtegral membrane protein whose structure 
consists of a muun-iiKe domain follwuri l<, a prolme-neh hings; a single lamp-hk> j domain a transmembrane region 
25 and a short cytoplasmic tail Two signature patterns for this family of proteins v\ere developed The first onets centered 
on the first conserved cysteine of the duplicated domain? The second coi respond? to a legion that includes the ex- 
tremity of the second domain, the totality of the transmembrane region and the cytoplasmic tail. 
[0953] Consensus pattern [3TAj-C-[LIVM]-[LIVMFrW]-Vv-[L!VMF iWpn Ti-[LiVMFYVV]-*(;>)-Y [C is irkoked in a 
disulfide bond] - 

30 a nsensus pattern C->i'Jt-D-<(3 > 4)-[LI\/MK2}-f--[LIVM3-<-[LiVM3-G-x(2V[LiVM]- v-G-[LIVMj{JV*-[LiVM3(4}-M-[FYj-<- 
[LIVM1-X('2HKR3-[RH]- x(1.2HSTAG]f2)-Y-[EQ3 [C is involved in a disulfide bond] 

[ 1j Fukuda M. J. Bloi. Chem. 266:21327-21330(1991). 

f >] Holness O I. da Sik .t R P f : a*\cett J , Gordon 5> Simux m D I. J Biol C hem ^68 9301 -9666, t9P'<) 

35 

[0954] 3.?; Lipolytic en7>mes "G-O-S-l." famtk, serine attiye site 

[0955] RtiCtntly [1] a family of lipolytic e nz> me s has bten ch : iKK larked This. f : imil> currently consist of the following 
proteins: 

40 - Aeromonas hydrophila hpase/phosphatidvlcholine-sterol acvltransferase. 

Xenorhabdus luminescens lipase 1. 

Viorio mimicus aryiesterase. 

Escherichia coli acyl-coA thioesterase I (gene tesA). 

s - ibrK f arrthaemolvticus theimolabile hemolysin 'atypical pfx % hohf a* e 
45 - Rabbit phosphuiipas- 1 AdRab-B an intestinal biush border prot>m with tstera^e and phospholipase A'ksophos- 

pnolipase activity that couid be in\ olved in the uptake ef dietary lipids AdPab~£ contains four repeats of abeut 

320 amino acids. 

Arabidopsis thahana and Biassic napus anther-speurV proiine-m.h protein APG 

A Pseudomonas putida h} oothetical protein in trpE-trpG intergemc region A serine nas oeen identified a part of 
so the active site in the Awoinonas ViLno mimicus and Eschen_ina v. olt enzyme-, it is located in a conserved se- 

quence motitth.it can he used as .j signature pattern toi the^e prutems. 

- Consensus pattern; {L!VMFYAGK4")-G-0-S-[L!VMJ-x(1 ,2HTAG]-G [S is the active site residue] 

55 [0956] 328 (Lipoprotein 4) Prokarvotic membrane lipoprotein lipid attachment site In prof-aryotes membrane lipo- 
proteins are synthesized with a precursor signal peptide ^nich is cleaned b\ a specific lipoprotein signal peptidase 
t^On^lpei-tida^e II) The f. eptid^s<r r«\ogni;res a conceived sequent- and -uts u|.streamota cysteine residue to ^'hkh 
a ql>ceride-fatty acid lipid is attached [1] bcme of the proteins kno,vn to undergo such pro^essirij >'unently include 



141 



EP 1 033 405 A2 



(forrecent listings see [1,2,3]}: - Major outer membrane lipoprotein (murein-iipoproteins) (gene Ipp). - Escherichia coii 
lipoprotein-28 (gene nlpA). - Escherichia coii lipoprotein-34 (gene nlpB). - Escherichia coii lipoprotein nlpC. - Es- 
cherichia coii lipoprotein nlpD - Escherichia coii osmoticaily inducible lipoprotein B (gene osmB). - Escherichia colt 
osmoticaily inducible lipoprotein E (gene osmE) - Eschenchta coii peptidoglycan-associated lipoprotein (gene pa!) - 
s Escherichia coii rare lipoproteins A and B (genes rplA and rplB). - Escherichia coii copper homeostasis protein cutF 
(ornlpE) -Escherichia coii plasmids iraT proteins, - Escherichia coii Col plasmids lysis proteins. - A number of Bacillus 
bela-lactarnases - Bacillus subtilis penplasrnic oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp/') ■■ Chlamydia trachomatis outer membrane protein 3 (gene omp3) ■■ Fibrobacter succinogenes endogiu- 
10 canase cel-3 - Haemophilus influenzae proteins Pal and Pep. - Klebsiella puliulunase (gene pulA). - Klebsiella puliu- 
lunase secretion protein puis - Mycoplasma hyothinis protein p37 - Mycoplasma hyorhims variant surface antigens 
A. B. and C (genes vipABC). - Neisseria outei membrane protein H.8. - Pseudornonas aeruginosa iipopeptide (gene 
IpplA ■• Pseudornonas solanacearum endoglucanase egi. ■• Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexnen invasion plasmid proteins mytJ and mxiM - Strep- 
's tococcus pneumoniae oligopeptide transport protein A (gene amiA). - Treponema pailidium 34 Kd antigen. - Treponema 
pallidiuin membrane protein A (gene tmpA), - Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmid protein 
yscJ ■■ Halocyanin from Natrobac.tenum pharaon is [4], a membrane associated copper- binding protein. This is the 
first archaebacterial protein known to be modified in such a fashion j.Frornthe precursor sequences of alithese proteins, 
a consensus pattern and a set of ruies to identify this type of post-translationa! modification was derived, 
so [0957] Consensus pattern' (DERK}i6)-[LlVMFWSTAG3i2)-[l-lVMFrSTAGCQHAGS]-C [C is the lipid attachment 
site] Additional rules 1 ) The cysteine must be between positions 15 and 35 of the sequence in consideration 2) There 
must be at ieast one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S,Wu H.C. J. Bioenerg. Bicmembr. 22:451-471(1990). 
25 [23 Klein R, Somorjai R.L., Lau P.C.K. Protein Eng. 2:15-20(1988). 

[ 3j von Heijne G. Protein Eng. 2'831 -534(1 989). 

[ 4] Mattar S.. Soharf B., Kent S.B.H.. Rodewaid K , Oesteihelt D, Engelhard M J. Biol. Chem. 269:14939-14945 
(1994). 

30 [0958] 329 (Lopoprotein 5} Piokaryotic membrane lipoprotein lipid attachment site in prokatyotes. membrane lipo- 
proteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signa! peptidase 
{signal peptidase lit The peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to 
which a glycende-fatty acid lipid is attached [1] Some of the proteins known to undergo such processing currently 
include (for recent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-iipoproteins) (gene Ipp). - Es- 

35 cherichia coii lipoprotein-28 (gene nlpA). - Escherichia coii lipoprotem-34 (gene nlpB) - Escherichia coii lipoprotein 
nipC. - Escherichia coii lipoprotein nipO. - Escherichia coii osmoticaily inducible lipoprotein B (geneosmB). - Escherichia 
colt osmoticaily inducible lipoprotein E (gene osrnE} - Escherichia colt peptidoglycan-associated lipoprotein (gene pal; 
- Escherichia coii rare lipoproteins A 3nd B (genes rplA and rplB) - Escherichia coii copper homeostasis protein cutF 
(or nlpE) - Escherichia colt piasmids traT proteins. - Escherichia coii Co! plasmids iysis proteins - A number of Bacillus 

40 beta-lactamases - Bacillus subtilis periplasms oiigopeptide-bindmg protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and S (genes ospA and ospB) - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7) - Chlamydia trachomatis outer membrane protein 3 {gene omp3). - Fibrobacter succinogenes endoglu- 
canase cel-3. - Haemophilus influenzae proteins Pal and Pep. - Klebsiella puliulunase (gene pulA). - Klebsiella puliu- 
lunase secretion protein pulS ■■ Mycoplasma hyorhinis protein p37 ■■ Mycoplasma hyorhinis variant surface antigens 

45 A, B ; and C (genes vlp ABC). - Neisseria outer membrane protein H.8 - Pseudornonas aeruginosa Iipopeptide (gene 
ippL) - Pseudornonas solanacearum endoglucanase egi - Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen. ■ Shigella flevneri invasion plasmid proteins mxtJ and mxiM. ■ Strep- 
tococcus pneumoniae oligopeptide transport protein A (gene amiAi - Treponema pailidium 34 Kd antigen -Treponema 
pailidium membrane protein A {gene tmpA) - Vibrio harveyi chitobiase (gene chb;. ~ Yersinia virulence plasmid protein 

so ysc-j - Halocyanin from Natrobactenum pharaonis [4], a membrane associated coppei- binding protein. This is the first 
archaebacterial protein known to be modified in such a fash ion). From the precursor sequences of all these proteins, 
a consensus pattern and a set of rules to identify this type of post-translationa! modification have been developed 
[08593 Consensus pattern: {DERK}{6)-(LIVMFWSTAGj(2HLIVMFYSTAGCQHAGS)-C [C is the lipid attachment 
site) Additional rules. 1)The cysteine must be between positions 15 and 35 of the sequence in consideration :>) There 

55 must be at ieast one Lys or one Arg in the first seven positions of the sequence 

[0960] [ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22.451-471(1990) [ 2] Kiem P t Somorjai R L , Lau P.C.K 
Protein Eng. 2:15-20(1988} [3] von Heijne G. Protein Eng 2 531-534(1989! [ 4j Matter S.. Srharf B.. Kent S.B H . 
Rodewaid K . Oesterhelt D.. Engelhard M J Biol Chem 269 14939-14945(1904) 
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[0961] 330. (Lurn binding] Riboflavin synthase alpha chain family Lum-binding site signature The following proteins 
have been shown [1 ,2] to be structurally and evolutionary related: - Riboflavin synthase alpha chain (RS-aipha) (gene 
nbC if! Escherichia coll. nbB tn Bacillus subtilis and Photobactenum leiognathi. R1B5 tn yeast). This enzyme synthesizes 
riboflavin from two moles af6.7-dimethy!-8-{1'-0-nbityl)lumazine (Lum). a pteridine-derivative. - Photobactenum phos- 

s phore-um lumazine protein t'LumP] (gene iuxL) LumP is a protein that modulates the color of the bioluminescence 
emission of bacterial luoiferase In the presence of LumP. light emission is. shifted to higher energy values {shorter 
wavelength). LumP binds non-covalently to 8,7-dimethyl-8-f I'-D-ribityl) lumazine. - Vibrio fischeri yellow fluorescent 
protein (YFP) (gene luxY). Like LumP, YFP modulates light emission but towards a longer wavelength. YFP binds non- 
covalently to f-'MN. These proteins seem to have evolved from the duplication of a domain of about 100 residues. In its 

to C-termmal section, this domain contains a conserved motif [KR]-V-N-[Ll]-E which has been proposed to be the binding 
site for Lum.RS-alpha which binds two molecules of Lum has two perfect copies of this motif, while LumP which binds 
one molecule of Lum, has a Glu instead of Lys/Arg in the first position of the second copy of the motif. Similarity. YFP, 
which binds to one molecule of K-'MN. also seems to have a potentially dysfunctional binding site by substitution of Gly 
for Glu in the last positionof the first copy of the motif. Our signature pattern includes the Lum-binding motif. 

*5 [0962] Consensus pattern: [LIVMF]-x{5)-G-[STADNQHKREQ!YW3-V-N-[LIVM]-E 

[ 1] O'Kane D.J., Woodward 8., Lee J , Prasher D.C Proc. Natl. Acad Sci. U.S.A. 88:1100-1104(1991). 
[2j O'Kane D.J., Prasher D.C. Moi, Microbiol. 8:443-449(1992). 

so [0963] 331 Lysyl o<idase putative copper-binding region signature 

Lysyl oxidase (LOX) [1 ] is an extracellular copper-dependent enzyme that catalyzes the oxidative deamination of pep- 
tidy! lysine residues in precursors of various coilagens and eiastins The deaminated lysines are then able to form 
aldehyde cross-links LOX binds a single copper atom which seems to reside within an octahedral coordination complex 
which includes at least three histidine ligands. Fourhistidine residues ate clustered in a central region of the enzyme. 

25 This region is thought to be involved in cooper-binding and is called the 'copper-talon' [1 j. This region was used as a 
signature pattern. 

[0964] Consensus pattern W-E-W-H-S-C-H-Q-H-Y-H 

[0965] [ 1] Krebs C J.. Krawetz S.A Biochim. Biophys Acta 1202 7-12(1993). 
[0966] 332. Metallo-beta-lactamase superfamily t'lactamase_8] 
30 [0967] [1 ] Neuwald AF. Liu JS, Liprnan DJ. Lawrence CE, Nucleic Acids Res 1 997;25' 1665-1 8 7? [2] Caifi A. Pares 
S, Duee £, Gailem M. Duez C, Frere JM, Dideberg O. EM BO J 1995;14:4914-4921. 
[09683 333. L-lactate dehydrogenase active site (Idhl ) 

L-lactate dehydrogenase {EC 1 . 1 . 1 .27) (LDH ) [1 ] catalyzes the reversible NAD-dependent interconversion of pyruvate 
to 1. -lactate. In vertebrate muscles and in lactic acid bacteria it represents the final step in anaerobic glycolysis. This 

35 tetrameric enzyme is present in prokaryoiic and eukaryotic organisms. Invertebrates there are three isozymes of LDH 
the M form (LDH-A), found predominantly in muscle tissues; the H form (LDH-8), found in heart muscle and the X form 
(LDH-C). found only in the spermatozoa of mamma is and birds In birds and crocodilian eye lenses. LDH-B serves as 
3 structural protein and is known as epsilon-crystaiiin [2] L-2-hydroyyisocapro3te dehydrogenase (EC 1 1 1 -)(L-hicDH) 
[3] catalyzes the reversible and stereospecific interconversion between 2-ketocarboxylic acids and L-2-hydroxy-car- 

40 boxyfic acids. L-hicDH is evolutionary related to LDH's. As a signature for LDH's a region was selected that includes 
a conserved histidine which is essential to the catalytic mechanism 
[0969] Consensus pattern: [LIVMAj-G-fEQj-H-G-fDNHSTj [H is the active site residue] - 

[ 1] Abad-2apatero C, Griffith J.P, Sussman J.L, Rossmann M.G. J. Mol. Biol. 198:445-467(1987). 
4S [ 23 Hendriks W., Mulders J.W.M., Bibby M.A., Slingsby C, Bloemendal H„ de Jong WW. Proc. Natl, Acad. Sci, 

U.S.A. 85:7114-7118(1988). 

[ 33 Lerch H.-P., Frank R., Collins J. Gene 83:263-270(1889), 

[0970] Ma late dehydrogenase active site signature (Idh2j 

so Mafate dehydrogenase (EC 1.1.1.37) (MDH) [1,2] catalyzes the interconversion of malate to oxaloacetate utilizing the 
NAD/NADH cofactor system. The enzyme participates in the citric acid cycle and exists in all aerobic organisms. While 
prokaryotic organisms contains a single form of MDH, in eukaryotic cells there are two isozymes one which is located 
in the mitochondria! matrix and the other in the cytoplasm. Fungi and plants also harbor a glyoxysoma! form which 
functions in the glyoxylate pathway In plants chloroplast there is an additional NADP-dependent form of MDH (EC 

ss .1-.I-.I-.S2) which is essential for both the universal C3 photosynthesis (Calvin) cycle and the more specializedC4 cycle 
As a signature pattern for this enzyme a region was chosen that includes two residues involved in the catalytic mech- 
anism [3]: an aspartic acid which is involved in a proton relay mechanism, and an argimne which binds the substrate. 
[0971] Consensus pattern [LIVWI]-T-[TRKMN]-L-D-xf2)-R-[STA]-x(3!-[LtVMFY3 [D and R are the active site resi- 
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dues]- 

[ 1] McAltster-Henn L. Trends Biochern. Set 13 178-181(1988) 
[2]GietlC Biochim. Btophys Acta 1100.217-234(1992) 
s [ 3] Birktof J.J.. Rhodes G., Banaszak L.J. Biochemistry 28:6085-8081(1989). 

[4] Cendrin F . Chroboczek J . zacc.ai G . Cvisenberg H . Mevarech M. Biochemistry 32.4308-431 3H993) 

p)972] 334. Legume lectins signatures 

Leguminous plants synthesize sugar-binding proteins which are called legume lectins [1,2 J. These lectins are generally 
to found in the seeds. The exact function of legume lectins is not known but they may be involved in the attachment of 
nitrogen-fifing bacteria to legumes and tn the protection against pathogens. Legume lectins bind calcium and manga- 
nese for other transition metals). Legume lectins, are synthesized as precursor proteins of about 230 to 260 amino acid 
residues Some legume lectins are proteolytioally processed to produce two chains, beta (which corresponds to the 
N-terminal) and alpha {C-terminal}.The lectin concanavalin A (con A) from jack bean is exceptional in that the two chains 
*s are transposed and ligated (by formation of a new peptide bond). The N-terminus of mature conA thus corresponds 
to that of the alpha chain and the C-terminus to the beta chain. Two signature patterns specific to legume lectins have 
been developed, the first is located in the C -terminal section of the beta chain and contains a conserved aspartic acid 
residue important for the binding of calcium and manganese; the second one is located in the N-terminal of the alpha 
chain. 

so [0973] Consensus pattern [ L I V ]-[ STAG]- V-[ D E Q V ]-[ F L I ]- D- [ S T ] [D binds manganese and caicium]- 
Consensus pattern [LIVj-x-iEDQHFYWKRJ-V-x-ELIVFj-G-ELFj-fST]- 

[ 1] Sharon N., Lis H. FASEB J. 4:3198-320(1990). 

[2] Lis H., Sharon N. Annu. Rev. Biochem. 55:33-37(1986). 

[0974] 335 CoA-iigases(iigases-CoA) 

[0975] This family includes the CoA ligases Succmyl-CoA synthetase alpha: and beta chains, malale CoA iigase and 
ATP-cttfaU: lyase Some members of the family utilise ATP others use GTP. 
[0976] [1] Wolodko WT. Fraser ME. James MN, Bridger VVA. J Biol Chem 1994.269.10883-1 0890 
30 [0977] 336. linker historic* HI and H5 famiiy 

[0978] Linker histone H1 ts an essential component of chromatin structure H1 links nucieosomes into higher order 
structures Histone H1 is replaced by histone H5 in some cell types. 

[0979] [1] Ramakrishnan V. Finch JT. Graziano V. Lee PL. Sweet RM, Nature 1993:362 219-223. 
[0980] 337 Lipocalin signature (lipl) 

35 Proteins which transport small hydrophobic molecules such as steroids, bilins, retinoids, and lipids share limited regions 
of sequence homology and a common tertiary structure architecture [1 to 5], This is an eight stranded antiparalle! beta- 
banel with a repeated + 1 topology enclosing a internal ligand binding site [1 ,3] The name 'lipocaiin' has been proposed 
[5] for this protein family. Proteins known to belong to this family are listed below {references are only provided for 
recently determined sequences). ~ Alpha- 1 -microglobulin (protein HO. which seems to bind porphyrin - Alpha-1-acid 

40 glycoprotein (orosomucoid). which can bind a remarkable array of natural and synthetic compounds [6], - Aphrodisin 
which, m hamsters, functions as an aphrodisiac pheromone - Apoiipopiotem D. which probably binds heme-ielated 
compounds. - Beta-lactoglobulm. a milk protein whose physiological function appears to bind retinoi - Complement 
component C8 gamma chain, which seems to bind retinoi (7j. - Crustacyanin [S], a protein from lobster carapace, which 
binds astaxant.hin. a carotenoid ■ Evpididymal-retinoic acid binding protein i'tv-RABP) [9] involved in sperm maturation 

■*s - Insectacyanin. a moth bilin-binding protein, and a related butterfly bilin- binding protein (BBP). - Late Lactation protein 
(LALP), a milk protein from tammar wallaby [10J. - Neutrophil gelatinase-associated lipocalin (NGAL) (p25) (SV-40 
induced 24p3 protein) jit] ■■ Odorant-binding protein (OBP). which binds odorants ■■ Plasma retinol-btnding proteins 
(PRBP) - Human pregnancy-associated endometrial alpha-2 globulin - Probastn (PB). a rat prostatic protein - Pros- 
taglandin D synthase (EC 5 3 .9.9 2) {GSH-independent PGD synthetase), a lipocalin with enzymatic activity [12] - 

so Purpurin, a retinal protein which binds retinoi and heparin. - Quiescence specific protein p20K from chicken (embryo 
CH21 protein). - Rodent urinary proteins (alpha-2-microglobuiin), which may bind pheromones. -VNSP 1 and 2, putative 
pheromone transport proteins from mouse vomeronasal organ [13] - Von Ebner's gland protein (VEGP) [14] (also 
called tear lipocalin), a mammalian protein which may be involved in taste recognition. - A frog olfactory protein, which 
may transport odorants - A protein found in the cerebrospinal fluid of the- toad Bufo Marinus with a supposed function 

55 similar to transthyretin in transport across the blood brain barrier [15] - Lizard's epididyrnai secretory protein IV (LESP 
IV), which could transport small hydrophobic molecules into the epididyrnai fluid during sperm maturation [16], - Prokary- 
otic outer-membrane protein blc [17].The sequences of most members of the family, the core or kernal lipocalins, are 
characterized by three short conserved stretches of residues [3. 18]. Others, the outlier flpocaltn group, share only one 
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ot two of these [3 1 e] A stgnatute pattern was built around the first common to all ontltet and ketnaliipocalins which 
occurs near the start ot the first beta-strand. 

[0981] Uon^nsus pattern [DENG}->-[DENQG°>TARK]-v(0 2t-[DENQ^RKj-[LiVF i']-jOF}-G-fC}- Vv-[FrWLRHj-<- 
[LIVMTAj- 

s Note: ft ts suggested, on the basis of similarities of staicture, function, and sequence, that this family forms an overall 
iuperfainilv called thetalvntis, vsrth the .jvidrn/str^ptavidin -PDOC G049P> ^nd the>.vk st lie fatty acid t Hiding pioteins 
V-PDOC00188- femt!i#-3[2 19] 

[ 1) Cowan S.W . Newcomer ME., Jones T.A. Proteins 8:44-61(1990). 
10 { 2) Igaraisht M., Nagata A.. Toh H.. Urade H.. Hayatshi N. Proc. Natl. Acad. Sci. U.S.A. 89:5376-5380(1992). 

[ 3] Flower D.R.. Nort:h A.C.T.. Attwood T.K. Protein Sci. 2:753-761(1993). 
[4j Godovac-^immermann J. Trends Biochem. Set. 13:64-66(1988). 
[■jjPehwo Brewk f-ASEB l 1 209-2 14(1 W'i 

[6] Kremer J.M.H.. Wilting J.. Janssen L.H.M. Pharmacol. Rev. 40:1-47(1989). 
J5 [?j Haefliger J.-A. Peitsch M.C.. Jenne D Tschopp J. MoL Immunol. 28 123-131(19911 

[ 8] Keen J.N.. Caeeres !.. Eliopoulos E.E.. Zagaisky P.F., Findlay J.B.C. Eur. J. Biochem. 197:407-417(1991). 
[ Qj Newcomer M.E. Structure 1: /-1 8(1993). 

[10] Collet C Joseph R Biochim biophyb Acta i 167 210-:22( 199J) 

[11] Kjeldsen L . Johnsen A.M.. Sengelov H.. Borregaard N, J, Biol. Chem 268:10425-10432(1993). 
20 [\z] Peit&.-h M r Boguski M S Tiend^ Biochem Set 16 363-361( !09 1 ) 

[1?]Miyav,akiA MatM^hrta R P\i i Mikoshioa T EMBO t 13 5835-^842i 1994' 
[14]KockK Ablets C Scnmale H Eui J Biochem 221 905~0t6t 1994) 

[15] Achen M.G.. Harms P.J.. Thomas T.. Richardson S.J., Wettenhall R.E.H.. Schreiber G. J. Biol. Chem. 267: 
23170-23174(1992). 

25 [16] Morel L. Dufarre J. -P.. Depeiges A. J. Biol. Chem. 288:10274-10281(1993), 

[1 ; ] Bishop R E , Penfoid b S Fiost t. S Holhr J v Wetnei J H J t-3tot Chem 270 ?3^& y -2 3 10^(1 99*5 > 
[18] Flower D.R.. North ACT, Attwood T.K. Biochem. Biophys. Res. Commun. 180:69-74(1991) 
[10-JFIwurDR FEBSLe-tt ^23 99-102(1993) 

30 [0982] Gyfosolrc fatty-acid binding proteins signature ilipz) 

A numow ?f kw molecular weight pioteins which rind tattv acios and othw organic anions are pie sent in the cytosol 
{l.'dl Most of them are structurally related and have probably diverged from a common ancestor. This structure is a 
t*>n stnnded wtipaiallel b^ta-baitd albeit\Mth avMdfedibxntmtntv between the kurth and fifth stiandt. 0'ith 1 repe^d 
f t t<.poko,v enclosing an mtein.il ligand binding site (2 7[ Pioteins kn own to celono, to this family indude ■ Su, tissue- 

35 specific types uf tattv acid binding proteins iFABFst found m hv»f intestine he-art ^pid^nnai adipocyte bt iin'rutina 
Heart f-'ABP is also known as mammaiy- derived yiowth inhibitor tMDGit, a protein that rw*-r*.rhiy inhibits prolrfeiation 
of mammary carcinoma cells Epidermal FABP is also known as psoriasis-associated FABP [3]. - Insect muscle fatty 
acid-binding proteins. - Testis lipid binding protein (TLBP). - Cellular retinoi-binding proteins I and II (CRBP). - Cellular 
retmoic acid-binding protein (CRABP). - Gastrotropm, an ileal protein which stimulates gastric acid and pepsinogen 

40 secretion. It seems that gastrotropm binds to bile salts and bilirubins. - Fatty acid binding proteins MFB1 and MFB2 
from the midgut of the insect Manduca se>ta [4] In addition to the above cytosolic proteins, this family also includes - 
Myelin P2 protein, which may be a lipid transport protein in Schwann cells. P2 is associated with the lipid biiayer of 
myelin. - Schistosoma mansoni protein Sm14 [5] which seems to be involved in the transport of fatty acids. - Ascaris 
suum p1S a secreted protein that may play 3 role in sequestering potentially toxic fatty acids and their peroxidation 

■*s products or that may be involved in the maintenance of the impermeable: lipid iayer of the: eggshell - Hypothetical fatty 
acid-binding proteins F40F4 2. F40F4 3, F40F4 4 and 2K742 5 from Caenorhabdrtis elegans As a signature pattern 
for these proteins a segment from the M-terminal extremity was use. 

[0983] Consensus pattern. [GSAlVK,]-x-[FYVV3-x-(L!VMF3-xf4?-(NHG]-[FY]-[DE3-x-[LIVMFYHL!VM]-x(2)-[LIV- 
MAKR]- 

50 Note it is suggested, on the basis of similarities, of structure, function, and sequence, that this family forms an ovetail 
superfamily, called the calycins, with the lipocalin < FOOC00187 > and avidin/streptavidin < PDOC00499 > families [8,73 

[ 13 Bernier I., Joiies P. Biochimie 69:1127-1152(1987). 

[ 2j Veerkamp J H , Peeters R A . Maatman R G H.J Biocnim Biophys Acta 1081 1-24(1991} 
ss [ 33 Siegenthaier G., Hotz R , Chaleilard-Gruaz D : Didierjean L. Hellman U., Saurat J.-H. Biochem. J 302 363-371 

(1994). 

[ 43 Smith A.F., Tsuchida K., Hanneman E., Suzuki T.C.. Wells M.A. J. Biol. Chem. 267:380-384(1992). 
[ 63 Moser D . Tencfier M . Griffiths G , Kiinkert M.-Q J Bioi Chern 266:8447-8454(1991) 
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[6] Flower D.R.. North A.C.T, Atiwood IK. Protein Sci. 2:753-761(1993). 
[ 7] Flower D.R. FEBS Lett. 333:99-102(1993). 

[0984] 338. Lipoxygenases iron-binding region signatures 

s Lipoxygenases (EC 1.13 11.4 are a class of iron-containing dioxygenases which catalyzes the- hydroperoxidation of 
lipids, containing a cis,cis-1 ,4-pentadiene structure. They are common in plants where they may be involved in a number 
of diverse aspects of plant physiology including growth and development, past resistance, and senescence, or respons- 
es to wounding [1]. in mammals a number of lipoxygenases isozymes are involved in the metabolism of prostaglandins 
and ieukotrienes [2] Sequence data is available for the following lipoxygenases' ■ Plant lipoxygenases (Ev.C 1 1 3.11 . 12} 

10 Plants express a variety of eytosolic isozymes as well as what seems [3] to be a ohloroplast isozyme - Mammalian 
afachidonate 5-lipoxygenase (EC 1.13-11.34) - Mammalian arachidonate 12-hpoxygenase (EC 1.1.3. 11__31) - Mam- 
malian erythroid cell-specific 15-lipoxygenase (EC 1.13.11. 33). The iron atom in lipoxygenases is bound by fourhgands. 
three of which are histidine residues [4], Six histidines are conserved in ail lipoxygenase sequences, five of them are 
found clustered tn a stretch of 40 amino acids. This region contains two of the three zinc-ligands; the other histidines 

*s have been shown [5] to be important for the activity of lipoxygenases. As signatures for this family of enzymes two 
patterns in the region of the histidine cluster were selected. The first pattern contains the first three conserved histidines 
and the second pattern includes the fourth and the fifth. 

Consensus pattern: H-{EQ}-x(3)-H-x-[LMHNQRCHGST]-H-{L!VMSTAC3(3)-E [The second and third H's bind ironj- 
[09S5] Consensus pattern [L.IVMA3-H-P-[t.lVM3-x-[KRQHL.IVMr3(2V-x--fAP]"H- 

20 

{ 1] Vtck B A . Zimmerman D C. (In) Biochemistry of plants A comprehensive treatise. Stumpf P.K . Ed . Vol 9. 
pp.53-90, Academic Press, New-York, (1987). 

[ 2] Needleman P., Turk J., Jakschik B.A., Morrison A.R., Lefkowith J. 8. Annu. Rev. Biochem. 55:69-102(1986). 
[ 3] Peng YL, Shirano Y, Ohta H., Hibino I, Tanaka K., Shibata D. J. Biol. Chem. 289:3755-3781(1994). 
25 [43 Boyington J.C., Gaffney B.J.. Amzel L.M. Science 260:1482-1486(1993), 

[ 5j Steczko J., Donoho G R. Clemens J.C , Dixon J.E;... Axeirod B. Biochemistry 31.4053-4057(1992} 

[0986] 339 Fumarate lyases signature- t!yase_1 ) 

A number of enzymes, belonging to the lyase class foi which fumatate is a substiate have been shown [1 23 to shaie 
30 a short conserved sequence around a methionine which is probably involved in the- catalytic activity of this type oi 
enzymes These enzymes are - FumatasefEC4 2 J 2; (fumarate hydiatase). which catalyzes the reveisible hydiatton 
of fumarate to L-malate. There seem to be 2 classes of fumarases class I are thermolabile dimenc enzymes <as for 
example Eschetiehia colt fumC). class II enzymes are theimostable and tetramenc and are found in prokaryotes tas 
fore<ample Evschenchia cohfumAandfumBiaswellas in eukaryotes. The sequence of the two classes of fumaiases 
35 are not closely related - Aspartate ammoriia-lyase i EC 4 3 1 1 1 < aspartame t which catalyzes the reversible conversion 
of aspartate to fumarate and ammonia T his reaction is analogous to that catalyzed by fumarase, except that ammonia 
ralhet than water is involved in thelrans-elimmalion reaction -Aigmosuccmase (EC4 3 2 1 tiargininosuocinate lyase), 
*hich catalyzes the formation of argmine and fumarate from atgininosuc.cin3te, the last step in the biosynthesis of 
arginme - Adenylosuccmase (EC 4 3 2 2) ^adenylosuccinate lyase t [3], which catalyzes the eight step in the de novo 
40 biosynthesis of purines, the formation of 5'-phosphonbosyl-5-amino-4-imida::oleearbovamide and fumarate ftom 1- 
<S-phosphoribosylV4-<r-J-succino-eaibo<amide) That enzyme can also catalyzes the formation of fumarate and AMP 
from adenylosuccinate. - Pseudomonas putida 3-carboxy-cis,cis-muconate cycloisomerase {EC 5.5.1.2) (3-carboxy- 
muconate lactonizmg enzyme i (gene pcaBi (4j. an enzyme invoked in aromatic acids catabolism 
[0987] Consensus pattern G-S-xi^l-M-x^-J-K-K-N- 

45 

[ 13 Woods S A Shwartzbach S D . Guest J R Siochim Biophys Acta 954 14-26(1988) 
[2j Woods S A., Miles J S , Guest J R FEMS Microbiol Lett 51. 181 -186(1 988! 
[3]ZalkmH Dixon J E Prog Nucleic Acid Res Mol Biol 42 259-28^)1992} 

[4] Williams S.E., Wooiridge E.M.. Ransom S.C., Landro J.A., Babbitt P.C., Kozarich J.W. Biochemistry 31: 
50 9768-9776(1992!. 

[0988] 340. MCM family signature and profile 

Proteins shown to be required for the initiation of eukaryotic DMA replication share a highly conserved domain of about 
210 ammo-acid residues [1,2 3] The latter shows some similarities [4] wrlh that ot various olhet families of DNA- 
55 dependent ATPases Eukaryotes seem to possess a farnih of six proteins that contain this domain They were first 
identified in yeast where most of them have a direct role in the initiation of chromosomal DNA replication by interacting 
directly with autonomously replicating sequences (ARS). They were thus called 'mimchromosome maintenance pro- 
teins' with gene symbols prefixed by MCM These si> proteins ate - MCM2 also known as cdc19 fin S pombe) [El] 
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-MCM3 also known as DMA polymerase alpha noloenzyine-associated protein P1 RLF beta subunitor ROA -MCM4 
also known as CDC54. cdc21 (in S.pombe! or dpa (in Drosophila). - MCM5, also known as CDC46 or nda4 (in S, 
pombe). - MCM6. also known as tnisS (in S.pombe). - MCM7, also known as CDC47 or Prolifera (m A.thalianai.This 
family is also present in archebactena. In Methanococcusjannaschnthere are four members: MJ0363, MJ0961, MJ1489 
s and MJECL13 The presence of a putative ATP-bindmg domain implies that these proteins maybe involved tn an ATP- 
consuming step in the initiation of DNA leplic.jtion in eul* .tivotoi Ai .j signature pattern .j perfectly tonsep ed regit n 
was sulf-otsid that rt*pr> j s> j nts a special version ol the B motit found in ATP-bindmg piotmns 
[09893 Consensus pattern: G-ilVTHLVAC]t2»-[!VT]-D-[DEHFLj-(DNST] 

10 [\] r,non A Maundr-ill K K^atsey S E Nude* And'. Rt:S 20 5571-55 v ^,1992} 

[2j Hu B... Burkhart R.. Schulte D... Musahl C. Knippers R. Nucleic Acids Res. 21:5289-5233(19931 
[ 3j Tye B.-K. Trends Cell Biol. 4:160-166(1994). 
[4jhoontn E- ^ hJuileic Audi. Pes 21 JMI 2M'\1993) 

*s [0990] 341 Macrophage migration inhibitory factor famtl\ signature (MIF) 

A protein called macrophage migration inhibitory factor (MlF) [1] seems to exert an important rote in host inflammatory 
lesponiOi it play a pivotal tole in the host tesponse to endotoic six ok and appears k i,erve as a pitman, "stress" 
hotmone that regulates systemic mtlammatoiy responses MIF is a secreted protein of Ho tesidues which ib not proc- 
essed from a larger precursor. D-dopachrometautomerase [2] is a mamma ha n cytoplasmic enzyme involved in melanin 

so biosynlhesis and th^tiautomert"« s D-dopachtc ma with concomitant decarboxylation to give 5 6-dihvdro <> indole (DHt i 
It is a protein of 117 residues highly related to MIF It must be noted that MIF binds glutathione and has been said to 
be telated to glutathione S-transfetases This assertion has been latei disproved [3] As a signature pattern for these 
piotfinb a conserved region was ^elected located in the central section 
[0991] Cun^nsus pattern [DE]-P-C-A-A{o)-[LIVM]-A-S-l-C-.-A-(LiVM}-G- 

[ 13 Buoala R Immunol, Lett. 43:23-26(1994). 

f23Qdho Hindermth r>. Rcsengren A -M Rosenoten E Potman H Btothem Bioph/s Pes Coininun IP - 

619-624(1993). 

[ Jj Pearson WR Protein Sci 3 52G-S2Ti 1994} 
[0992] 342. MIP family signature 

Recently the sequence of a number of different proteins, that all seem to be transmembrane channel proteins, has 
been tound to be highly related [1 to 4].These proteins are listed below. - Mammalian major intrinsic protein (MIP). M!P 
In the matoi ot mponent of lens fibei cap lunctit nt G.jp ^notions mediate duect e>c hange of ions .tnd small molecule 

35 from one oeli to another - Mammalian aquapotins ["] Thusu proteins form s^atsir-speoifio channels that provide thu 
plasma membranes of red cells and kioney proximal and collecting tubules with high permeability to water thereby 
permtttinq water to move in the direction of an osmotic, gradient. - Soybean nodulm-28, a major component of the 
penbacteroid membrane induced during nodulation in legume roots after Rhizobium infection. - Plants tonoplast intrinsic 
proteins (TIP), There are various isoforms of TIP: alpha (seed), gamma. Rt (root!, and VVsi (water-stress induced). 

40 These piotein^inay allow the diffusion otwatei ammo acids wd/oi peptides tamthetonoLlastinteiioi to the cytoplasm 
- Bacterial glycerol faciliiakti Lioitin (gene glpF) which facilitates the mowirnent of g!>cerol acioss the cytoplasmic 
membrane - Salmonella tvphi murium ft opened iol diffusion facilitates (gene pdnF) - Veust FPSf a glyreio! uptahe/ 
efflux facilitator protein. - Drosophila neurogenic protein 'big brain' (btbt. This protein may mediate intercellular com- 
munication: it may functions by allowing the transport of certain molecules(si and thereby sending a signal for an 

45 t^ode-tmai c> j ll to become an epidennuhlast mste id of i neuroblast - V> ist hypothetical piotem > FL0S4c - A hypo- 
thetical protein from the pepX region of lactococcus lactis. The MIP family proteins seem to contain six transmembrane 
segments Computer analysis shows that these piotein probably aiose by a tandem, intragenic duplication <?\,<?nt from 
an : mc-istia! ( totem that contained thr^e tiansmembrant; segments, ^s a signatur-i ( att-im : i well cctn^tved r^gic n 
v^as selected s^.hich is located in a probable cytoplasmic loop between the second and thiro transmembrane regions 

50 [0993] Consensus pattern [HN> A]-x-N-P-[STA}-[LIVMF3-[^T3-[LIVMF3-E(.^TAF/3- 

[13RecerJ Peizei A Saier M H Jr CRC Cut Biochem 2H ZVo-2b 7(1 '393 j 
[23 Baker M.E., Saier MM. Jr. Ceil 60:165-186(1990). 

[ 33 Pac tj M Wu L-F Johnscn K D Hotfte H r hnspt^ls M J Swe-it ^ Sandal n U Saiei M H Ji Mcl 
ss Microbiol. 5:33-37(19911 

[4j\-^bto^GJ Pisano M M Chenelmsky A B Tiends Biochem Sci 10 )7u-i71i 19U1 1 
[ 5] in, peels M J Ayie P Ti ends Bioc hem Sci 19 421-4?5( l"94t 
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[0994] 34c Manrtelate racernase > muconate lactoneing enzyme famtfs stgnatutes 

Mandelate racernase (EC 5,1,2.2) (MR) and muconate fadonizing enzymecEu 5.5.1.1) (MLE) are two bacteria! en- 
nynie 1 . iru'dvt;d tri aionvitic : ioid catabolism Thev cafalvze mechanistically distinct reaction 1 , yel ihey are i elated at 
the level of their primary quaternary (homooctamer) and tertiary structures [1 2]. A number of other proteins also seem 

s to be evolutionary related te these t^o enzymes These are The various plasmid-encodea cnloromuconatc cyclois- 
omer.ts<rSiFC b 5 I ') ■ I: p>. heMchia toll piotein ispA [3j, rs p A s.eems to be invoh ed in the degradation of homoienne 
lactone (HSU or of one of its metabolite. - Escherichia colt hypothetical protein ycjG. - Escherichia colt hypothetical 
protein yidU. - A hypothetical protein from Streptomyces ambofaciens }4], Two signature patterns have been developed 
toi the^e en^ymei. both contain conserved acidic residues 

to [0995] The second ( attorn contains an aspartate and a glutamic which are ligands foi either : i magnesium ion nn 
MR) or a manganese ion (inMLE). 

[0996] Consensus pattern: A-x-[SAGCN]-[SAG]-[LIVM3-[DEQ]-x-A-[LAhx-[DE]-[L!A]-x-[GA]-[KRQ]-x{4)-[PSA]- 
[LiV}*{2)-L-[LIVMFp3- 

Consensus pattern [LIVFj-<i2VD-v-[NH]-v(7 )-[ACL]-<fu)-[LI\ MFj-<{7 )-[Ll\ M]- E-fDENQj-P [D and E bind a latent 
*s metal lonj- 

[ l] Neidh.jrt D J henwn G L Gerlt J A Petsko G A Natuie 347 094(1 S*0i 

[Z\ Petbko G A Ken\on G L Gerlt j A Rmge O I' ozanch J W Ttendb Biochem Sci 16 37I-J 7 6( 1993i 
[ ojMuismanGW kolter R Science 2b5 5^7-5^09^4t 
20 [4j Sthr^idtii D Aiglc B LebkndP biinontt J M Dedans B J Gen Miaobiol 139 2559-256 v ( 1993) 

[0997] 344 Merccoite Surface Antigen 2 tMSA-D famiK 

[0998] Thomas AW Can OA Carter JM LjuiJA Mol Bk diem Parasitol t<^90 43 211-220 
[0999] 24^ MSP (Major sp-Tm pioteint domain 
2S [10003 Major sperm proteins are involved in sperm motility These proteins oltgcmense to form filaments Partial 
matches to this domain are also found in other non MSP protein? Tht-Sf include Swiss P40076 and S>, tss 9M6i 3 
[1001] [1] Bulk -;k TS. Roberta TM Stewart M J KM Biol 1996 2f 3 284-296 [2] King Kl. Si** art M Poberls TM 
Sua^v M .1 Cull S01 1«92 101 84""-«5" 

[1002] 34c (Matrix 1 Viral mattu piotein Found in MotbilliMius and paramyvowns pneunwirub Number of mem- 
30 bees: 105 

[1003] 347. O-methvltransferase imetnyltransf) 

[1004] This family includes a range of Ometfn [transferases These en2>mes utilise S-aoenosyi methionine 
[1005] [1] Keller NP Dischmgei HC Bhatnagai O Cleveland TE Ullah AH Appl Environ fuKKbiol 1 993 39 479-484 
[1006] 'MS Magnesium oheiatase, subunit Oh II 
35 [1007] Maonesium-chelatas^ is 1 rhr> j e-componenl enz«, ine that catalyses th> j insertion of Mg2+ into protoporphyrin 
1,\ 'I hi? is the tir*.t unique step in the synthesis of (batteno ichloiophyil Due to this it is thought that Mg ^helatase ha*, 
an important role in channeling inter- mediates into the (bacteno)chlorophyll branch in response to conditions suitable 
for photosyntn^tic growtn Chi! ^nd BchD has*- mokcutet o eight heK'^en 38-42 1-Da 

[1008] [1] Waiket CJ. Willows RD. Biochem J 1t'97.327 321-c33 [2] Petersen BL. Jensen PE. Gibson LC. Stummann 
40 BM. Hunter ON, Henningsen KW, J Bactenol 1998:180:699-704. 
[1009] 349. Plasrnid recombination enzyme (Mob_Pre) 

[1010] With some plasmids, recombination can occur in a site specific manner that is independent of RecA In such 
cases, the recombination event requires another protein called Pre. Pre is a plasrnid recombination enzyme. This 
protein is: also known as Mob (conjugating mobilization). 
■>s [1011] [1] Priebe SD. Lacks SA. J Bacterid 1989,171 4778-4784 
[1012] 350 Monooxygenase 

[1013] This family includes diverse enrrymes that utilise FAD. 

[1014] [1] Gatti DL, Palfey BA. Lah MS. Entsch B. Massey V. Ballou DP. Ludwig ML. Science 1994.266.110-114. 
[1015] 351. Mov34 family 

so [1016] Members, of this family are found in proteasome regulatory stibunits. eukaryotic initiation factor 3 (elF3) sub- 
units and regulators of transcription factors. 

[1017] [1] Aravtnd L Ponting CP. Protein Sci 1998,7:1250-1254. [2] Hershey JW, Asano K. Naranda T, Vornlocher 
HP, Hanachi P, Merrick WC. Bioenimie 1996:78:903-907. 
[101 S] 352. Myc ammo-terminal region (Myc_N_term) 
55 [1019] The myc family belongs to the basic heib-ioop-helix leucine zipper class of transcription factors, see HLH. 
Myc forms a heterodimer with Max. and this complex regulates cell growth through direct activation of genes involved 
in cell replication [2]. 

[1020] [1] Facehini LM. Perm LZ. FASEB J 1998:12 633-651 [2] Grandon C. Eisenman RN, Trends Biochem Sci 
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1997;22:177-181. 

[1021] 353. (Metallothio 2}Meta!lothioneirt Members of this family are metallothioneins. These proteins are cysteine 
rich proteins, that bind to heavy metals. Members of this family appear to be closest to Class ll metal lothioneins, seed 
metalthio. Number of members: 55 
s [1022j m Medline. 98267202 Characterization of gene repertoires at mature stage of citrus fruits through random 
sequencing and analysis of redundant metallothionein-like genes expressed during fruit development. Moriguchi I 
Kita M. Hisada S, Endo-lnagaki T, Omura M; Gene 1998:211,221-227. 
[1023] 354. MAGE family 

[1024] The MAGO- {melanoma antigen-encoding genet family are expressed in a wide variety of tumors but not in 
to normal ceils, with the exception of the- male germ cells, placenta, and, possibly, cells of the developing embryo The 
cellular function of this family is unknown. 

[1025] [1] MeCurdy DK. Tat LQ. Nguyen J, Wang Z, Yang HM. Udar N, Naiem F, Concannon P, Gatti RA, Mol Genet 
Metab 19&8;63:3~13. 

[1026] 355 Malic enzymes signature. Malic, enzymes, or ma late oxidoreductases. catalyze the oxidative decarbox- 

*s ylation of malate into pyruvate important for a wide range of metabolic pathways. There are three related forms of malic 
enzyme [1,2,3): - NAD-dependent malic enzyme (EC 1,1.1.38}, which uses preferentially NAD and has the ability to 
decarboxylate oxaloaoetate {OAA.i. It is found in bacteria and insects. ■■ NAD-dependent malic enzyme (EC 1..J..J..39) 
which uses preferentially NAD and is unable to decarboxylate OAA. It is found in the mitochondria I matrix of plants 
and is a heterodimerof highly related subunits. - NADP -dependent malic enzyme (EC 1.1. 1.40), which has a preference 

so for NADP and has the ability to decarbo>ylate OAA This form has been found in fungi, animals and plants In mammals, 
there are two isozymes, one, mitochondria! and the other cytosolic Plants also have two isozymes chloroplastic and 
cytosoiic. There are two other proteins which are closely structurally related to ma isoenzymes: - Escherichia coll protein 
sfcA, whose function is not yet known but which could be an NAD or NADP-dependent malic enzyme. - Yeast hypo- 
thetical protein YKL029c. a probable malic enzyme There are three well conserved regions in the enzyme sequences. 

25 Two of them seem to be involved in binding NAD or NADP: The significance of the third one, located in the central part 
of the enzymes, is not yet known This region has been developed as a signature pattern for these enzymes. 
[1027] Consensus pattern F-x-[DV]-D-xf2)-G-T-[GSA3->;-[!V]-!<-[LIViWA3-[GAST)(2)-[L.IVMF](2)- 
[1028] [ 1] Artus N.N , Edwards G E. FEBS Lett. 182:225-23.3(1965) [ 2) Loeber G., Infante A.A., Maurer-Fogy I . 
Krystek E„ Dworkm M.B. J. Biol. Chem. 266:3018-3021(1991). [ 3] Long J.J., Wang J.-L, Berry J. O. J. Biol. Chem. 

30 269:2827-2833(1994). 
[1029] 356. (matrixin) 
Matnxins cysteine switch (aka peptidase JVM 0) 

[1030] Mammalian extracellular matrix metalloproteinas.es. (EC 3 '1.24.-), also known as matnxins [1] (see 
<PDOC00129>), are zinc-dependent enzymes They are secreted by cells in an inactive form (zymogen) that differs 
35 from the mature enzyme by the presence of an N-terrninai propeptide. A highly conserved octapeptide is found two 
residues downstream of the C-terminal end of the propeptide This region has been shown to be involved in autoinhi- 
bttion of matnxins [2,3), a cysteine within the octapeptide chelates the active site zinc ion, thus inhibiting the enzyme. 
This region has been called the 'cysteine switch' or 'autoinhibitor region'. 
[1031] A cysteine switch has been found in the following zinc proteases: 

MMP-1 (EC 3.4.24.7) (interstitial coliagenase}. 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3,4,24.17) (stromeiysin-1). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

45 - MMP-8 (EC 3,4,24.34) (neutrophil coliagenase). 

- MMP-9 (EC 3 4 24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4,24 22) istromeiysin-2). 

- MMP-1 1 (EC 3.4.24.-) (stromelysin-3). 

MMP-12 (EC 3.4.24.65) (macrophage metalloelastase) 
so - MMP-1 3 (EC 3.4.24.-) (coliagenase 3} 

MMP 14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1 ). 

MMP-1 5 (EC 3.4.24.-} (membrane-type matrix metalliproteinase 2) 

MMP-1 6 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

Sea urchin hatching enzyme (EC 3.4,24 12) fenvelysin) j4) 
ss - Chlamydomonas remhardtii gamete lytic, enzyme (6LE) [5] 

[1032] Consensus. pattemP-R-C-EGNJ-v-P-EDRHLIVSAPKQ] [C chelates the zinc ion] Sequences known to belong 
to this class detected by the pattern ALL, except for cat MMP-7 and mouse MMP-1 1. 
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[ 1] Woessner J. Jr. FASEB J, 5:2145-2154(1991). 

[ 2} Sanchez-Lopes R.. Nicholson R . Gesnel U C . Matrisian L.M., Bteathnach R. J. Biol. Chem. 263 11892-11899 

(1988). 

[ 3] Park A.J . Matrisian L.M.. Keifs A.F, Pearson R , Yuan Z . Navre M. J. Biol Chem. 266 1684-1590(1991} 
s [4] Lepage I, Gache C. EM BO J. 9:3003-3012(1990). 

[ 5j KinoshitaT., Fukuzawa H . Shimada 'f.. SaitoT., Matsuda Y. Proc, Natl. Acad. Sci. U.S.A. 89:4693-469/ (1992). 

[1033] 357. Vertebrate metallothionems signature (metalthiot 

Metallothionems (MT) [1,2.3] are small proteins which bind heavy metals such as zinc, copper, cadmium, nickei, etc., 
to through clusters of thioiate bonds. MT's occur throughout the animal kingdom and are also found in higher plants, fungi 

and some piokaryotes. On the basts of structural relationships MT's have been subdivided into three: classes. Class ! 

includes mammalian MT's as well as MT's from crustacean and molluscs, but with clearly related primary structure. 

Class II groups together EviT's from various species such as sea urchins, fungi, insects and cyanobacteria which display 

none or only very distant correspondence to class ! MT's Class HI MT's are atypical polypeptides containing gamma- 
's glutamylcysteinyi unite. Vertebrate class I MT's are proteins of 60 to 68 amino acid residues. 20 of these residues are 

cysteines that bind to 7 bivalent metal ions. As a signature pattern a region that spans 19 residues and which contains 

seven of the metal-binding cysteines was chosen, this region is located in the N-termina! section of class-! MT's 

[1034] Consensus pattern' C~x~C~[GSTAP3~x(2)-C-v-C-x(2)-C-x-C-x(2}-C-x-K- 

zo [1] Harris D.H Annu Rev Biochem 55 913-951(1986). 

[ 2] Kagi J H.R.. Schaffer A. Biochemistry 27 8609-8515(1988} 
[3] Binz P. -A. Thesis, 1996, University of Zurich. 

[1035] .358. Mitochondrial energy transfer proteins signature (mito_cair) 

25 Different types of substrate carrier proteins involved in energy transfer are found in the inner mitochondria! membrane 
[1 to 5], These are: - The ADP.ATP carrier protein {AAC} (ADP/ATP translocase) which exports ATP into the cytosol 
and imports ADP into the mitochondrial matrix The sequence of AAC has been obtained from various mammalian, 
plant and fungal species - The 2-oxoglutaraie/rnaiate earner protean (OGCPt, which exports 2-oxoglutaraie into the 
cytosol and imports malate or other dicarboxylic acids into the mitochondrial matrix This protein plays an important 

30 role- in several metabolic piocess.es. such as the malate/aspaitate and the ooglutarate/isoeitrafe shuttles - The phos- 
phate earner protein, which transports phosphate groups from the cytosol into the mitochondrial matrix. - The brown 
fat uncoupling protein (UCP) which dissipates oxidative energy into heat by transporting protons from the cytosol into 
the mitochondrial matrix - The tricarboxylate transport protein (or citrate transport protein} which is involved in citrate- 
HWmalate exchange. It is important for the bioenergetics of hepatic, cells as it provides a carbon source for fatty acid 

35 and sterol biosyntheses and NAD for the: glycolytic pathway - The Grave's disease; carrier protein (GDC), a protein 
of unknown function recognized by IgG in patients with active Grave's disease. - Yeast mitochondrial proteins MRS3 
and MRS4 The e>act function of these proteins is not known They suppress a mitochondrial splice defect in the fust 
intron of the COB gene and may act as carriers, exerting their suppressor activity by modulating solute concentrations 
in the mitochondrion - Yeast mitochondrial FAD carrier protein (gene FLX1). - Yeast protein ACR1 [6], which seems 

40 essential for acetyl-CoA synthetase activity. - Yeast protein PETS - Yeast protein PMT - Yeast protein RIM2 - Yeast 
protein YMM1/SHM1. - Yeast protein YMC1. - Yeast protein YMC2. - Yeast hypothetical proteins YBR291c, YEL006w; 
YER053c, YFR045w, YHR002w, and YIL006w - Caenorhabditis elegans hypothetical protein K11H3.3.Two other pro- 
teins have been found to belong to this family, yet are not localized m the mitochondrial inner membrane: - Maize 
amyioplast Brrttie-1 protein. This protein, found in the endosperm of kernels, could play a role in amyiopiast membrane 

■*s transport. - Candida boidmii peroxisomal membrane protein PMP47 [7] PMP47 is an integral membrane protein of the 
peroxisome and it may play a role as a transporter. These proteins all seem to be evolutionary related. Structurally, 
they consistof three tandem repeats of a domain of approximately one hundred residues Each of these domains 
contains two transmembrane regions. As a signature pattern, one of the most conserved regions in the repeated domain 
was selected, located just after the first transmembrane region. 

so [1036] Consensus pattern: P-x-[DE]-x-[LIVATHRK}-x-[LRH]-[l-lVMFYHQGAIVM]- 

[ 1] Klingenberg M. Trends Biochem. Sci. 15:108-112(1990). 
[2j Walker J. E. Curr. Opin, Struct Bio!. 2:519-526(1992). 
[ 3j Kuan J., Saier M H. Jr CRC Crit. Rev. Stochem. 23:209-233(1 993 j. 
ss [4] Kuan J., Saier M.H. Jr, Res, Microbiol. 144:671-672(1993). 

[ 5] Nelson D.R., Lawson J.E.. Klingenberg M., Douglas M.G. J. Mo!. Biol. 230:1159-1170(1993), 
[ 6] Palmier! F FEBS Lett. 346 48-54(1994) 

E 7] Jank B.. Haberrnann B . Schweyen R J , Link TA Trends Biochem Sci. 18:427-428(1993). 



150 



EP 1 033 405 A2 



[1037] 359. Prokaryotic moiybdoptenn oxidoreductases signatures (moiybdoptenn) 

A number of different prokaryotic oxidoreductases that require and bind amoiybdopterin cofactor have been shown 
[1.2.3] to share a number of regions of sequence similarity These enzymes art: - Escherichia coli respiratory nitrate 
reductase (EC 1 7 99 4). This enzyme compley a! tows the bacteria to use nitrate as an electron acceptor during anaer- 

s obic growth The enzyme is composed of three different chains' alpha, beta and gamma. The alpha chain (gene narG) 
is the moiybdopterin-binding subunit Escherichia colt encodes for a second, closely related, nitrate reductase complex 
which also contains a moiybdopterin-binding alpha chain (gene narZ). - Escherichia colt anaerobic dimethyl sulfoxide 
reductase (OMSO reductase). DMSO reductase is the terminal reductase during anaerobic growth on various sulfoxide 
and N-oxide compounds. DM SO reductase is composed of three chains: A, 8 and C The A chain (gene dmsA) binds 

10 molybdopterin. - Escherichia coli biotin sulfo>ide reductases rcienes bisC and bisZ) This enzyme reduces a sponta- 
neous oxidation product of biotin, 60S. back to biotin It may serve as a scavenger, allowing the cell to use biotin 
sulfoxide as a biotin source. - Methanobactenum formicicum formate dehydrogenase (EC 1.2.1.2). The alpha chain 
(gene fdhA) of this dimeric enzyme binds a moiybdoptenn cofactor. - Escherichia coii formate dehydrogenases -H 
(gene fdhFV -N (gene fdnG) and -O (gene fdoG} These enzymes are responsible tor the oxidation of formate to carbon 

*s dioxide, in addition to molybdopterin, the alpha (catalytic) subunit also contains an active site, selenocysteine. - V\fo- 
linella succinogenes polysulfide reductase chain. This enzyme is a component of the phosphoryiative electron transport 
system with polysulfide as the terminal acceptor. It is composed of three chains. A, 8 and C The A chain (gene psrA) 
binds moiybdoptenn. - Saimonelia typh (murium thiosulfate reductase (gene phsA). - Escherichia coii tnmethylamine- 
N-oxide reductase (EG 1.6.6.9) (gene torA) [4] ■■ Nitrate reductase (EC 1.7.99.4) from Klebsiella pneumoniae (gene 

so nasA). Alcaligenes eutrophus, Escherichia coli, Rhodobacter sphaeroides. Thtosphaera pantotropha (gene napA), and 
Synechococcus PCC 7942 (gene rtarB) These proteins range from 715 ammo acids (fdhF)to 1246 amino acids (narZ) 
insize. Three signature patterns for these enzymes were derived. The first is based on a conserved region in the N~ 
terminai section and contains two cysteine residues perhaps involved in binding the moiybdoptenn cofactor It should 
be noted that this region is not present in bisC. The second pattern is derived from a conserved region located in the 

2& central part of these enzymes. 

[1038] Consensus pattern: [STAN3-->;--ECH3"y(;.^3)-C<STAGi-j;GSTVMf : i-x-C-x-j;Llv'MF AVhHUVMAj^(3,4)-[DEN- 
QKHT]- 

Consensus pattern: [STA3-x-[STACK2)~x(2)-|STA]-D-[L!VMY](2).L-P-x-j:STAC](2)-x{2)-e- 

Consensus pattern: A-x(3HGDT3-l-x-[DNQTK3-x-[D£A3-x-[LIVM3-x-[LIVMC3-x- [NS]-x(2MGS}-x(5}-A-x-[L!VMHST]~ 

[ 1] Wootton J C., Nicoison R E . Cock J.M., Walters D.E . Burke J.F., Doyle W.A : Bray R C Biochim. Biophys 
Acta 1057:157-185(1991 ). 

[2j Bilous P.T., Coie S.T., Anderson W K Weiner J.H. Moi. Microbiol. 2:785-795(1988). 
[ 33 Trieber OA, Rothery R.A., Weiner J.H J. Biol. Chem 289:7103-7109(1994). 
35 [43MejeanV., Lobbi-NivoiC., LepelletierM., Giordano G., Chippaux sVf, Pascal M.-C. Mot. Microbiol. 11:1169-1179 

(1994), 

[1039] 360. Bacterial mutT domain signature 

The oactena! mutT orotein ib involved in the GO b\stem [ i] responsible foi removing an ovidatrvehj damaged fotm of 
40 guanine (8-hydroxyguanme or7 8-dihydro-8-oxoguanine) from DNA and the nucleotide pool. 8-oxo-dGTP is inserted 
opposite to dA and dC re sidues of template LT\f A with almost equal efficiency On is k adinj to A I io (.-> C tiansversic n^, 
MutT specificaiiy degrades 8-oxo-dGTP to the monophosphate with the concomitant release of pyrophosphate. MutT 
is a small piotein of about 12 to 1c Kd It has been shown \2 j\ that a legion of about 40 ammo a^d lesidues which 
is found in the N terminal p.trt of muff, can.jist be found in a ^ anety of other piokaiyttk vird ,jnd eukaiyttit proteins 
4$ These proteins are: 

Streptomvces pneumoniae mutX. 

A mutT homoiog from f lasmio pS-^M? of Streptomvces ambofa:tens 
Bartonella baciiiiformis invasion protein A t gene invA). 
■io - Eschenchia coli dATP pviophosphohvdiohst 
Protein D260 from African swine fever viruses. 
Proteins D9 and D10 from a variety ot poxviruses 
Mammalian 7.S-dihydro-8-oxoguanine triphosphatase (EC 3.1.6.-) [4], 

Mammalian diadenosme 5\5"'-P1 ,P4--tetraphosphate asymmetrical hydrolase ( Ap4Aase) (EC 3.6.1.17) [5], which 
ss cleaves A-S*-PPPP-5"A to yield AMP and ATP. 

A protein encoded on the antisense RtsiA of the basic fibroblast growth factor gene in higher vertebrates. 
Yeast protein YSA1. 

Escherichia coli hypothetical protein yfaO. 
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Escherichia colt hypothetical protein ygcfU and H109Q1, the corresponding Haemophilus influenzae protein 
Escherichia colt hypothetical protein yjaD and HI0432, the corresponding Haemophilus influenzae protein, 
Escherichia colt hypothetical protein yrfE. 
Bacillus subtilis hypothetical protein yqkG. 
s - Bacillus subtilis hypothetical protein yzgD 
Yeast hypothetical protein YGL067w, 

[1040] It is proposed [2] that the conserved domain cou id be involved in the active center of a family of pyrophosphate- 
releasing, NTPases A? a signature pattern the core region of the domain was selected, it contains four conserved 
to glutarnate residues. 

[1041] Consensus pattern' G-<{5)-E-yi4KSTAGC3-[LIVMAC.]-y-R-E-[LIVMFT3-x-E-E- 

[13 Michaels Wi.L. Miller J. H. J. Bacterial. 174:8321-6325(1992). 
[23 Koonm E V Nucleic Acids Res. 21. 4847-4847(1 993 1 
is [33 Mejean V, Sailes C, Bullions Mi, Bessman M.J., Claverys J.-P. Moi. Microbiol. 11:323-330(1994). 

[4] Sakuini K.. Purulent M., Tsuzuki T., Kakuina I, Kawabata S., Maki H., Sekiguehi M. J. Biol. Chem. 268: 
23524--23530CI993). 

[5] Thome N.M.H., Hankm S„ Wilkinson M.C.. Nunez C Barraclough R., McLennan A.G. Biochem. J. 311:717-721 
(1995). 

[1042] 361 Myb DNA-btndmg domain repeat signatures 

The retroviral oncogene v-rnyb and its ceiiulai counterpait c-myb. encodenuclear DNA-binding pioteins that specifi- 
cally recognize the sequence VAAG(GT)G [l] The myb family also includes the following proteins - Drosophila D- 
myb [23 - Vertebrate rnyb-ltke proteins A-rnyb and B-myb [?] - Maize C1 protein a trans-acting factor which controls 

25 the expression of genes involved in anthocyanin biosynthesis. - Maize P protein [43, a trans-acting factor which regulates 
the biosynthetio pathway of a flavonoid-denved pigment in certain floral tissues ■ Arabtdopsis thaliana protein Gt.1 [53- 
required for the initiation of differentiation of leaf hair cells (tnchomess. - A number of myb/c!-related proteins in maize 
and barley, whose roles are not yet knov\n [4] - Yeast BAS1 [7], a transcriptional activator for the HIS4 gene - Yeast 
REB1 [8], which recognizes sites within both the enhancer and the promoter of rRNA transcription, as we!! as upstream 

30 of many genes transcribed by RNA polymerase I! - Fission yeast cdcS, a possible transenpiton factor whose activity 
is requited fot ceil cycle piogression and gtowth dunng G2 - Fission yeast myb!. which regulates telomere length and 
function. - Yeast hypothetical protein YMR213w.One of the most conserved regions in all of these proteins is a domain 
of 160amino acids It consists of three tandem repeats of 51 to 53 ammo acids in myb. this repeat legion has been 
shown [9] to be involved in DNA-binding. The major part of the first repeat is missing in retroviral v-myb sequences 

35 and in plant myb-related proteins Yeast REB1 differs from the other proteins tn this family in haung a single myb-like 
domain. As shown in the following schematic representation, two signature patterns for myb-like domains were devel- 
oped Hie first is located m the N-terminal section the second spans the C-tetrninal extremity of the domain 



xxxxxxxxxWxxxEDxxxxxxxxxxxxxxWxxIxxxxxxRxxxxxxxxWxxxx ********* 
. position of patterns. 



45 [1043] Concensus pattern VHSTM2)-E-[DE]-v(2i-[!-!\ 3- 

Gonsensus pattern VVyj >[LI]~[S*>G]-\(4 oi~R~Y(8t-[YW]-/i3)-[LlVMj- 

Note: this pattern detects the three copies of the domain in mvb, d-myb A -myb and B-myb: the second of the two 
complete co( les of plant m>t -related pioleins and Hie lasl two copies of > east BAS1 

so [ 13 Biednkapp H., Borgmeyer U., Sippe! A.E., Klempnauer K.-H. Nature 335:835-837(1988). 

[ I ] Peters C WB SippelAE Vingn n M , Klempnauei K H PMBO I 6 30«>3090< I'Vw ) 

[33 Nomura N.. Takahashi M.. Matsui M., Ishn S . Date T,. Sasamofo 3 Ishizaki R. Nucleic Acids Res. 16: 

11075-11090(1988), 

[4] Grotewoid E,, Athma P., Peterson T Proc. Natl. Acad Sci. U.S.A. 88:4587-4591(1991), 
55 [SjOppenheinwrDG Herman PL Sivakum iran S E^ch t Marl. s M D Call o7 483-4^1 &c t1 } 

[6jMaroccoA \A-issenbach M BeckerD Paz-AiesJ SaedlerH ealammiF RohdeW Moi Gen Genet, 216: 
183-187(1989). 

[ 73 Tice-Baldwin K... Fink G.R., Amdt K.T. Science 246:931-935(1989). 
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[ejJuQ WbnowBE \A-amei J R Wo! Celt bio! lO 5226-o2o4 t 1Ct0) 
[ 9] Klempnauer K.-H.. Sippe! A.E. EMBO J, 6:271 9-2725(1 987). 

[1044] 302 NAD-dependent ghcwol-3-phnsr.hate dehydrogenase ^ignuture 
s NAD dependent glycerol 3 phospnato dehydrogenase i EC 1 1 1 8' (OPDi catalyses the reversible reduction of dihy- 
dtoxweetone phosphate to ghoetol 3 phosphate It is a eukrtP/otic >.vk st lie tx modimeiic prttein of rttx tit 40 Kd As 
i signature pattftn ;< a.iy(.ine-fn.h ffgion that is ptohabiy [1j irk eked in NAD-binaina <,-<!<, sooted 
[1045] Consensus pattern: G-[AT3"[LIVM]-K4DNHLIVMK2)-A-x4GAj-x-G-ELIVMF}-x- [DE]-G-[L!VM]-x-[LIVMFyyV]- 
G-x-N- 

10 [1046] [ 1] OtUi J Atgos P R^mawiMG Eui J Bicoh-irn 109 O25-130H98U) 
[1047] Nucleosoinf assembly protein (NAPt 

[1048] If n, thought that NAPs may be in\ol\ed in i^gulating g^nt t-vpte^ion a result ot ht*.ton<s =i.xebSibility [I] 

[1] Rodngue;: P Muntoe D Ptawitt D Chu LL Bnc £ Kim I Peid LH D-.u'ies C NuKjgama H Loebb^rt R 
»5 Winterpacht A. Petruzzi Mu. Higgins MJ, Nowak N. Evans G, Shows T, Weissman BE, Zabel B. Housman DE, 

Pelletier J. Genomics 1997:44:253-265. 

[23 Schnieoeis f-' Dork 'I Ainemann I \ ogel X Wei net M Schmidts J Hum Mol Genet 19^,e 180M80' 

[1049] 364 N8- ARC domain 
20 van df*f Biezen EA lon<^ ID r mr Bio! 1 <■) 98 3 226-227 

[1050] 365 Nucleoside diphospnote hnoses active sit-? 

[1051] Nucleoside oipnosphate kinases tEC 2 7 4 6> (NDKj [ij are engines required foi tne b\nthebis of nucleoside 
triphosphates (NTP) other than ATP. They provide NTPs for nucleic acid synthesis. uTP for lipid synthesis. UTP for 
polv^acchande synthesis and GTP lof protein elongation signal transduction and mictoiuhule polymerization in eu- 

2& I- azotes there seems to be a small famtK of NDK isozymes each of which acts in a different subcellular compartment 
and/or has a distinct biological function. Eukaryotic NDK isozymes are hexamers of two highly related chains (Aand 
B>[-3 By tandom association fA6 AfB ABf B6) the; e tv\oknds ok ha in fomi isoenzymes differing mtretr isoelectrc 
point NDK are proteins, of 1" Kd that .set via a ping-pong msifhani^m in which a hUidine residue is phosphoryiated 
b\ transfer of the tetrninai phosphate group from ATP In the presence of magnesium the DhObphoenzyme can transfer 

30 its phosphate grouf. to any NOP to pioduc-i an N! PNDK isozyme's h : u'« fc-ien sequenced Irorn prokaryotio and eu- 
karyotic sources. It has also been shown [33 that: the Drosophila awd (abnormal wing discs) protein, is a microtubuie- 
associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23.The sequence of NDK has been 
highly conserved through evolution. There is a single histidine residue conserved in all known NDK isozymes, which 
is involved in the catalytic mechanism 323, Our signature pattern contains this residue. 

35 [1052] Consensus pattern N-A t 2i-H-(G-i]-P-D-[^A3-[LIVMPKNEl [H is th-> put iiiv> active s,rte r-^iouf ]- 

[ 13 Parks R Agaiwal R (fn> Tru* Enzvmes ,3id edition! 8 oCT-^f 19^) 
[23GillesA-M Fiese-anC \nnica A Lascu I J Biol Chem 2GP B784-^3Q( 199! t 
[ 33 Biggs J Hetspergei E Steeg PS Lrotta LA Sneatn A Cell 63 93o940{ i3u0t 

[1053] 066 NitnU ano sulfite reductases iKn-sullui sirohenw-bindmg sit-i (rJlk_->IRt Nitrite reductases (NiRt 313 
catalyze the reduction of nitrite into ammonium, the second step in the assimilation of nitrate. There are two types of 
NiR: the higher plant chloroplastic form of NiR (EC 17 7,1; is a monomelic protein that uses reduced ferredoxin as 
the election donoi while fungal jnd h.jctenal NiR \i':C 1 6 6 4) .tie txmodimenc f totems that uses N&D(P)H a-., the 

45 elyction donor. Both forms of NiR contain a sirohyint-Fe. ar«a non-sulfur centers. Sulfite reduetasy (NADPPI) (EC 
1.8.1.2) (SIR) [23 is the bacterial enzyme that catalyzes the reduction of sulfite to sulfide. SIR is an oiigomeric enzyme 
with a subumt composition of aipha{8)-beta(4), the alpha component is a flavoprotein (SIR-FP), while the beta com- 
ponent is a siroheme, iron-sulfurprotein (SiR-HP).Sulfite reductase (ferredoxin) (EC 1.8.7. 1} [3] is a cyanobactena! 
and plant monomenc enzyme that also catalyzes the reduction of sulfite to sulfide. Anaerobic sulfite reductase (EC 

so 1 8.1.-} (ASR) [4J, a bacterial enzyme that catalyzes the NADH-d&pendent reduction of sulfite to sulfide ASR is an 
oiigomeric enzyme composed of three different subunits The C component (geneasrC) seems to be a siroheme. iron- 
sulfur protein These enzymes share a region of sequence similarity in their C-terminal half, this region which spans 
about 80 amino acids includes four conserved cysteine residues. Two of the Cys are grouped together at the beginning 
of the domain, and the two others are grouped in the middle of the domain. The cysteines are involved in the- binding 

55 of the iron-sulfur center; the last one also binds the siroheme group [2j. A signature pattern ftom the region around the 
second cluster of cysteines was derived. 

[1054] Consensus pattern (STVj-G-C-xiai-C-^fQHDEl-EIJVMFJ-IGATJ-lLIVMFjfThetsvoC'sareison-sttlfurlijjands]- 
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[ i] Campbell WH KinghotnJR Ttendb Biochem Sci to 315-J i0( 1990} 
[ 2] Crane B.R., Siege! L.M., Getzoff E.D. Science 270:59-67(1995). 

[ Giss^lmann G klau<.m«.-ie i P SthwennJD Bioehim BiOf.hys Acta 1144 l02-1U6<l993t 
[4] Huang C I , Bonett E L t Bartenol 175 1 ?44-1TCt 19911 

s 

[1055] ,467 (NM'H M\MPtoy!-Co*\ piotein N-mvii!:,toy!tr,.in!:,fe[.ts<r iignrttures Mynptoyl-Co*\ 
prormn N-inynstovltranstera^ ( EC 2 3 1 9~) (Nmn [1] is the- ^nzym^ KL-sponsibiu for tran^fprnng i invnstate group on 
the N-termma! glycine of a number of cellular eukaryotic and viral proteins. Nmt is a monomertc protein of about 50 to 
60 Kd*A hose i.equenoe appeal s to he well conserved 'Uvo highly conserved region? have been developed as i-tgnature 
10 ( att-ims The tit^t cnt- is located in th-i oential s-icticn Hu :i s^'ond in the '"-terminal part 
[1056] Cun^nsus pattern E-I-N-F-L-C-a-H-K- 
Consensus pattern: K-F-G-x-G-D-G- 

[1057] [ 1] Rudmck 0 A McvVheitei C & Gokel G W Goidoti J I A*' E-n-ymol 67 3' >4 30(1^3) 
[1058] 368 ^DP-glttcose pyiorJiosphorylase Mgnatur^s (NTPjKin^fwase't 

J? [1059] ADP-glucose p\ropnosphor)lase (gfucose-t-pnosphate adenyiyltransferase) [1 2)(EC 2 7 7 27) catalyzes a 
very important step in the biosynthesis of alpha 1.4-glucans (glycogen or starch) in bacteria and plants: synthesis of 
the activated giucoiyl donoi ^DP-gluco* e from ojuiose 1 -pht sphate and ATP t\DP-glu-x s<r pytophosphorylase ii ,j 
tettamenc allostencally tegulated enzyme It ib a homotettamenn bacteria v. hHe in plant chloroplasts and amylopiabts 
tt is a heterotetrainer ot two different, vet evolutionary related, subunits. There are a number of conserved regions in 

J- tht; sequence cf baelt-ital arid ( lent ACjP-jiuco^e pvfiiprui&phuivla&f* suburut 1 . Tlnti* of these r-igions wt-ie s^lei. ted 
a? signature patterns Tne fnstt i voare N-termmal and nave b^en r.topos>ed to be fart of the ollo^tenc and/or substrate- 
binding sites in the Escherichia coli enzyme (gene gigC). The third pattern corresponds to a conserved region in the 
central part of the enzymes. 

[1060] Cun^nsus pattern [AG]-G-G-y-G-[STK]- A -L-y l 2l-L-ETA]-yOvA-^-P-A-ELV]- 
25 Consensus pattern: W-{Fy3-x-G-[STJ-A-[DNSHHASHi-!VMFYW)- 

Consensus pattern: EAPVHGS3-M-G-(LiVMN]-Y-EiVCHL!VMFY3-x(2HDENPHK)~ 

[IjNal-ataPA Gi^e-nt-TW Anderson J M Smith-White B J OhtiTW PehsO PlmtMolBtoi r 10£9-109o 
{1991 ). 

30 E 2] Preiss J.. Ball K.. Hutney J.. Smith-White B.J.. Li. L. Okitsa "T.W. Pure Appi. Chem. 83:535-544(1991). 

FlO&lj 3b9. Sodium/hydrogen exchanger family 

[1062] Na/H antiporters aie kev transporters in maintaining the pH <.f actively m^taLolizing (.ell*. The moleaihi 
mechanisms of antiport are unclear 
A* Tfu^t* aniipoEters cont nri 10-12 tt msmembran^ it-gicns (Ml at tht ammo-terminus and a large evtoplasmn. it-gicn at 
the carboxyl terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 
and M7 legions are highly conserved Thus. this, is thought to be the region that i*. involved in the tunsport ot scdium 
and hydrogen ions. The cytoplasmic region has little similarity throughout the family. 

[1063] [1] Dibrov P Fliegei L FEBS Lett i0u8 424 i-C [2] Orio^ski J Grinbtein S J Biol Chem 1097 272 
40 223"3-z2376 [3] Uumate M Petie<xa K Lake U Oilowski J J Biol Chem 1998 273 ^951-^959 
[1064] J/0. tSodi urn: sulfate sympoiter family signature (Na_sulph_symp) 

Integral membrane proteins that mediate the intake of a wide variety of molecules with the concomitant uptake of 
sodium ions (sodium symporters) canbe grouped, on the basis of sequence and functional similarities into a number 
ot diitinot families On? of these f.jmili^s outr?ntl\ -xmists of the following pioteins ■ M^mmalun i,odium/sulf.tte 

■*s cutransporter [1] - Mammalian Ee-nal sodu im'dioaEbo^yl lit; (.otfansporteE [2j which transports, succinate and citrate - 
Mammalian intestinal sodtum/dicarboxylate cotransporter. -Chiamydomonas retnhardtn putative sulfur deprivation re- 
sponse lequlatoi SAC1 jO-j ■ Caenoihahditis elegans hypothetical proteins B02Bt fr 6 K08E-5 2 and P10V 1 
- Es>'tu : '!ichia coli tiypothetical pioteiri yfc b - Hae mophilus intluenzat- hypothetical prot-iin FIIO^OS - Syntictiocystis 
strain PCC c80c! hypothetical protein sl!0640 - Methanococcus lannaschu hypothetical protein MJ0t>72 These trans- 

■io portfii. ate piotein^ of from 430 to ^20 ammo 3v.ids o'hkh ai* 1 highlj hydrophobk and which probabK contain ^bout 
12 tiansuvmbr^n<r regit, m As a siynatuie pattern a conseh'^d legion was: selected which ts located in or near the 
penultimate transmembrane region, 

[10653 Consensus pattern: ESTACP]-S-x(2)-F-xf21-P-EL!VMHGSAH3")-N-x-[LIVMJ-V- 

ss [ 1 j Mafkovich D Fuigo i Stance G Bibt-rJ Muiur H Proc Natl -^oad 'in U S ^ «0 e073-80 - ^ 10«T> 

E^jPajorAW Am J Phystoi ':7u 042-648(1 OtGi 
E 3] Davids jP, i tldt? F H C-<iossinan A FMBO J 15 /15U-215<^( tyX) 
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[1066] 371 NifU-iii.e domain 

[1067] Thfo is an alignment of thf catLo<y-tfiminal domain This, is the on!> lommon region b^twe^n the NifU piotein 
from nitrogen-fixing bacteria and rhodobaetenal species. The biochemical function of Nitlf is unknown [1]. 
[1068] Ouzounis C Eotl' P Sanoei C Tiems Biornem S:i 1904 19 f 09-200 

s [1069j 272 Nitriiases / cyanide hydratasc signatures 

Nitnlrtses (t.C .3 6 6 1} .tie enzymes that invert mtnies into then coir^spondiny .joids and ammonia 'Ihev are wide- 
spread in mictohes as v\t*!i as in pi mis wh^re- the-«, convert indole-3-acvtonitnlt; to th> j hormone mdofi^-artilic acid 
A conserved cysteine has been shown [1 .2] to be essentia! for enzyme activity: it seems to be involved m a nucleophiiic 
attack on the nitrite carbon atom. Cyanide hydratase (EC 4,2,1.66 ) converts HON to formamide. in phytopathogemc 

to fungi it is used to avoid tht lotc ^ff^'d of c> : mtde ieleas-id t> wounded plants [3] The sequence of cyanide hydiohst; 
is evolutional related io that of riitnlasts ^att hvpoth-Hiea! proteins. VIL1^4r and > IL165r ilso belong to this tamily 
As signature j. attains foi th***,* 1 enzyme two ^onsprwd regions v^re selected The first is located in the N-t^rminal 
iectitn whilv thv Sfxml which tontjins the .toti^e ■site cysteine, is located in the tentta! section 
[1070] Consensus pattern G-n24LIVMFVKC}-y-ElF]-<-E-x[2j-Et-fVMj-v-G-y r -P- 

*5 Consensus pattern: G-[GAQ|-x(2)-C-(WA]-E-[NH)-x(2)-EPST)-ELlVMFYS)-x-EKRj [C Is the active site residue]- 

E !] Koh.jy.jshi M Izui H Nayaia\v,Ti Wnada H Fioi Natl Acad Sci U "? A 90 .?4 '-2^1(19^) 

[,j Kobayashi W homeda H YanakaN Nagasawa T "samadaH J Biol Chem '167 I0"46-':07';Hluy2J 

[33 Wang P Vanetten M O Rioohem Btophvs Res ummtin 1 a? 1048 1054(1 992 1 

[1071] 373. NusB family 

[1072] Tne Nus£ orotein is involved in the tegulation of rPNA biosynthesis by transciiptional antitermmation 
[1073] Hueny^bM RolzC Gb_iiwmd R P*>t*>rartderl R Bwglfchnei F Ruiitw G Bachei A kesblet H Gemrrtfckei 
G. EMBOJ 1998:17:4092-4100. 

25 [107 4] 274 iNeur Chan 1 Neurotransmitter-galea ton-channels signature 

Neurotransmitter-gated ion-channels [1,2,3,4] provide the molecular basis for rapid signal transmission at chemical 
synapses They are pcst-synapdct'licjomf-iirtfanj-nieint'iari^oomfle^esthattianhiently twin a ionic t hannsr-i upon Ihe 
binding of a specific nemotfansmitter Pr-^ntly tht s> j quenct; of suhuruts from fW>» typts of n^urottsin^mittsir-ciat^d 
receptois are i.nov\n - Tne nicotinic acetylcholine teceptor ^AchP J an evcitatoty cation channel In the motor endolates 

30 of vetf^bral-is it is composed of lour difie tenf sul: units, (alpha beta g : imma and de Ita oi epsilont vwth a molat sfcichi- 
ometryof2 1 1 1 in neurones tne -^hR receptoi is composed of ko diffwent t\ pes of si ibumts alpha am non-aipha 
(also called beta). Nicotinic AchRs are also found in invertebrates. - The glycine receptor, an inhibitory chloride ion 
channel Thf glycine iev.eptot is a pentamei compcstd of Ho diftei^nt suLumfe <alph=i and bftai - The garni na- 
ammohutync-acid (GABA) receptor, which is also an inhibitory chloride ion channel. The quaternary structure of the 

35 GAEA i«f uptot is comply at least four classes of subunits, are l-novwi to e-Aisf i alpha btMa gamm 1 and d> j ltat and 
there are many variants in each class (for example; six variants of the alpha class have already been sequenced!. - 
The serotonin *>HT3 iecef.ki S-irotonin is a btcq-ink' hutmon-i that functions. : is a rn^urotiansmilter a hormone and a 
mitogen Tn^r*- are seven major ciiotips of serotonin leceptois si> of tht^s^ groups t^HTI 5HT2 ano 5HT4 to 'iHT' 7 ! 
tiansduce extiacelluiarbignai bv' activating G ptotetns while CHTc is a ligand-gated cation-specific ion channel which 

40 Mhen activated caui.* 1 *. fast depolanzmg responses in neuions - The giutamatt iev.eptoi an e^utatoty cation channel 
Glutamate is itn» nviin octtatofv n-iuiotf : insn utter in the bf : nn At least thr^e diff^n^nt types oi glutanviU ri : 'd : 'ptors 
ha^e been described and are named a:coroing to their t,ele:tK'e agonists (kainate N-methyl-D-aspartate t'NMDA' and 
quisqualate) All hiown sequences of subuntts from neuiotrarti-mittei -gated ion -channels aie stiucturaHy related Thev 
are composed of a large extracellular glycosylated N-termina! Iigand-binding domain, followed by three hydrophobic 

45 transmembrane re-gtorts i vhieh form the- ionic ohann^l followe-d by in intracvllulsjr recuon of variable length A fourth 
hydrophobic region is found at the C -terminal of the sequence The sequence of subunits from tne AchR GABA 5HT3 
and Gly leceptois aie cieaiiy evolutionaiy related and shaie many regions of sequence similantiei. These sequence 
similarities ar> : ' either abserit oi vtiy we : ik m tht Glu it;Ct;( tors In the N-tt;imin : il ^'Orac^llulai domain of A^hR/GAB^/ 
St 1T3/GK receptors tnere are tv\o conserved cysteine residues s^.hich in AcnR have been shown to form a disulfide 

so bond fssenti^l to thf t^rtiai\ structuie of the n=c<=ptor A numbei cf amino auds Letv.<=en thf> t\'*o disuifide-bonded 
cvstemes .ji<t .tlso <.ons<r^ ed 'iheiefote this region a as used as a signatuie p.jttetn fortius sut class of pioteins 
[1075] Consensus pattwn C-y-ELIVMFO]->-[L!VMFj-x(24FV1-P->-D->(' : !)-C Elbe t-^o C's ate !in^1 ty a disulfide 
bondj- 

ss [ 13 Stroud R.M.. McCarthy M.P.. Shuster M. Biochemistry 29:11009-11023(1990). 

E 23 Betz H Neuron 5 J63~392< 190U1 

E 3] Dmojedine R , Myf is. S J Nk holas R A FA^FB J 4 ?o i?-?r 4r>i15*y0) 
E4] Barnard E A Tiends Btoch^m Su 17 16^-374(1 9<-j2i 
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[1076] 375. Orotidine S'-phosphate decarboxylase active site 

Orotidine 5'-phosphate decarboxylase (EC -1 1 1 23) (OMPdecase) [1,2] catalyzes the last step in the de novo biosyn- 
thesis of pyrirnidines, the- decatbo<ylation of OMP into UMP In higher eukaryotes OMPdecase is part, with orotate- 
phosphoribosyitransferase. of a Afunctional enzyme, while the prokaryotic. and fungal OMPdecases are monofu notional 
s protein Some parts of the sequence of OMPde-case- are well conserved across species. The best conserved region is 
located in the N-terminai half of OMPdecases and is centered around a lysine residue which is essential for the catalytic 
function of the enzyme This region has been developed as a signature pattern. 

[1077] Consensus pattern: [LiVMFTAHl!VMF3-x-D-x-K-x(2)-D-l-[GP]-x-T-[LIVMTA3 [K is the active site residue]- 

10 [ 1] Jacquet M . Guilbaud R . Garreau H Mol. Gen Genet 211 44 1-445(1 988) 

[2] Kimsey H.H.. Kaiser D. J. Biol Chem. 267 819-824(1992). 

[1078] 378 ATP synthase delta (OSCP) subunit signature 

ATP synthase {proton-translocating ATPase) (EC 3 8 1 34} [1.2] is a component of the cytoplasmic membrane of eu- 
»5 bacteria, the inner membrane of mitochondria, and the thylakoid membrane of chloropiasts. The ATPase complex is 
composed of an oligomeric transmembrane sector, called CF(Q), which acts as a proton channel, and a catalytic core, 
termed coupling factor CF(1) 

One of the subunits of the ATPase complex, known as subunit delta in bacteria and chloropiasts or the Oiigomycin 

Sensitivity Conferral Protein (OSCP) in mitochondria, seems to be part of the stalk that links CF(0) to CF(1). it either 
so transmits conformational changes from CF(Q) into CF{1 ) or is involved in proton conduction [3], 

The different deita/OSCP subunits are proteins ot approximately 200 amino-acid residues - once the transit peptide 

has been removed in the chloropiast and mitochondrial forms - which show only moderate sequence homology. 

The signature pattern used to detect ATPase delta/OSCP subunits is based on a conserved region in the C-terminai 

section of these proteins. 
25 [1079] Consensus pattern: [L!VMMUVWYT]-x(3Hl!VMTHD^ 

[LiVMHKRHENQ]-x-[GSEN] 

[ 1] Futai M., Noumt T„ Maeda M. Annu. Rev. Biochem, 58:111-136(1989). 
[2j Senior A.E Physiol Rev 68 177-231(1983) 
30 [ 3] Engeibrecht S., Junge W. Biochim. Biophys. Acta 1015:379-390(1990). 

[1080] 377. Aspartate and ornithine carbamcy transferases signature 

Aspartate carbamoylttansferase (EC 2 1 3.2) (ATCase) catalyses the conversion of aspartate and carbamoyl phos- 
phate to carbamoy [aspartate, the second step in the de novo biosynthesis of pyrimidine nucleotides j1] in prokaryot.es 
35 ATCase consists of two subunits: a catalytic, chain (gene pyrB) and a regulatory chain (gene pyrl), while in eukaryotes 
it is a domain in a multi-functional enzyme (called UR.A2 in yeast, rudimentary in Drosophila, and CAD in mammals 
[2]) that also catalyzes other steps of the biosynthesis of pyrirnidines 

Ornithine carbamoyltr3nsferase (EC 2 1 3 3) (OTCase) catalyzes the conversion of ornithine and carbamoyl phosphate 
to citrulline. In mammals this enzyme participates in the urea cycle [3] and is located in the mitochondrial matrix, in 
40 prokaryotes and eukaryotic microorganisms it is involved in the biosynthesis of arginine in some bacterial species it 
is also involved in the degradation of arginine [4] (the arginine deaminase pathway). 

It has been shown [5] that these two enzymes are evolutionary related. The predicted secondary structure of both 
enzymes are similar and there are some regions of sequence similarities One of these regions includes three residues 
which have been shown, by crystailographic studies [6], to be implicated in binding the phosphoryl group of carbamoyl 
45 phosphate. 

This region was selected as a signature for these enzymes. 

Consensus pattern F-x-[F:K.3-y-S-[G'rj-R-T[S. R. and the 2nd '!' bind carbamoyl phosphate] 

-Mote 1 the residue if! position 3 of the pattern allows to distinguish between an ATCase iGlu) and an OTCase (Lys) 

so [ 1] Lerner C.G., Switzer R.L. J. Biol. Chem. 261:11158-11165(1986). 

[ 2] Davidson J.N.. Chen K.C . Jamison R.S , Musmanno LA., Kern C.B BicEssays 15:15/--164( 19Sj3) 

[ 3] Takiguchi M., Matsubasa T, Amaya Y„ Mori M, BioEssays 10:163-166(1989). 

[4] Baur H.« Stalon V.. Faimagne P., Luethi E.. Haas D. Eur, J. Biochem. 166:111-117(1987), 

[5] Houghton J,E., Bencim DA, O'Donovan G.A., Wild J.R Proc. Natl. Acad Sci. U.S.A. 81:4864-4868(1981) 
ss [63 Ke H.-M, Honzatko R.8., Lipscomb W.N, Proc. Nati. Acad, Sci. U.S.A. 81:4037-4040(1984). 

[1081] 378. Oleosins signature 

Oleosins [1] are the proteinaceous components of plants' lipid storage bodies called oil bodies. Oil bodies are small 
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droplets (0 2 to 1 5 mu-m in diameter? containing mostly tnacyigtycerol that are surrounded by a phospholipid/oleosin 
annuius Oleostns mav have a structural role tn stabilizing the lipid body during dessieation of the seed, by preventing 
coalescence of the oil They may also provide reeognriion signals tor specific lipase anchorage tn lipoiysis during 
seedling growth Oleosms are found in the monolayer lipid/ water interface of oil bodies and piobably inteiact with both 

s the lipid and phospholipid moieties. 

Oleosins are pioteins of 16 Kd to 24 Kd and are composed of three domains an N-tetminal hydrophilic region of 
variable length (from 30 to 60 residues), a centra! hydrophobic, domain of about 70 residues and a C-iermina! amphtp- 
athic region of variable length (from 60 to 100 residues). The central hydrophobic domain is proposed to be made up 
of beta-strand structure and to interact with the lipids {'>]. It is the only domain whose sequence is conserved and 

10 therefore a section from that domain was selected as a signature pahern 

[1082] Consensus pattern: [AGHST).x(2HAG3-x{2)-[LIVMHSAD)-T-P^L!VMFK4).F~S-P^L!VM3(3)-P-A 

[1j Murphy D.J., Keen J.N., O'Suiiivan J.N., Au D.M.Y. Edwards E.-W.. Jackson P.J.. Cummins !., Gibbons T., 
Shaw C H . Ryan A J Biochim Biophys Acta 1088 86-94(1991 ) 
*5 [2j Tzen J.T.C.. Lie G.C.. Huang A.H.C. J. Biol. Chem, 267:15626-15634(1992). 

[10833 379. i Orhi VPS} Orbivirus outer oapsid protein VPS 

[10S43 This paper shows the location of the different capsid pioteins and their relation to each other 
[10S5] [1 1 Schoehn G. Moss SR, Nuttali PA. Hewat C A. Vrroiogy 1 997.235' 191 -200 
so [1086] 380 Orn/DAP/Arg decarboxylases family 2 signatures 

Pyndoxal-dependent decarboxylases acting on ornithine, lysine, argmtne and telated substtates can be classified into 
two different families on the basis of sequence similarities [1,2,3] The second family consists of 

Eukaryotic ornithine decarboxylase (EC. 4 1 1 17) <ODCj ODC catalyzes the transformation of ornithine into pu- 
2S trescine. 

Proharyotic diaminopimeltr acid decarboxylase sEC 4. t. t.20i tDAPDCt DAPDC catalyzes the conversion of di- 
aminopimelic acid into lysine; the last step m the biosynthesis ot lysine. 

Pseudomonas suingae pv tabaci protein tabA tabA is probabK in\o!ved in the biosynthesis of tabtoxm and is 
highly similar to DAPDC, 

30 - Bacterial and piani biosynthetic arginme decarboxylase (EC 4 1 1 19) (ADC) ADC catalyzes the transformation 
of arginme into agmatine the first step in the biosynthesis of putiescme fiom aiginine 

The above proteins, while most probably evolutionary related, do not share extensive regions of sequence similarities 
Two of the consetxed regions were selected as signature patterns. The first pattern contains a conserved lysine residue 
35 which is know, n, in mouse ODC [4] to be the site of attachment of the pyndoxal-phosphate group The second pattern 
contains a stretch of three consecutive glycine residues and has been proposed to be part of a substrate-binding region 
[5]. 

These enzymes are collectively known as gioup !V decatbo>.yi3ses [3] 

[10S73 Consensus pattern [FY3-[PA]-x-K-[SACVHWHCLFW3-A ( '4)-j;LlVMF3-[LiVMTA3-A(2)-j;LlVMA3-x u ?!-[GTE3 (H is 
40 the pyridoxal-P attachment site] 

Consensus pafiern [GSj-x (2 6 i-[LIV!viSCP]-x(2}-[L!\.'MF]- [DNSj-[Ll VtVtCAj-G-G-G-|L!VMF Y]-[GS T PC EQ] 

[ 1] Bairoch A. Unpublished observations (1993). 

[ I:] Martin C , Cami ft. Yen P. Stragiei P . Parsot C. Patte J -C Mo! Bioi. f^'ol. ? ?49-559( !988) 
45 [ 2] Sandmeier E , Hale T I , Christen P Eur J Bioehem 221997-1002. 1994} 

[4j Poulin R., Lu L, Ackermann B. ( Bey P., Pegg A.E. J. Bioi. Chem. 267:150-158(1992). 
[ 5j Moore P- C , Boyle S M. J. OSacterioi. 1 72 463 1-4640(1 990 1 

[1088] 381. Osteopontin signature 

so Osteopontin is an acidic phosphorylated glycoprotein of about 40 Kd which is abundant in the mineral matrix of bones 
and which binds tightly to hydro <y apatite j 1,2,3] It is suggested that osteopontin might function as a eel! attachment 
factoi and could play a Key role in the adhesion of osteoclasts to the mineral matn< of bone 
Osteopontm-h is a kidney protein which is highly similar to osteopontin and probably also involved in cell-adhesion. 
As a signature pattern a highly conserved region located at the N-terminal extremity of the mature protein was selected 

ss [1089] Consensus pattern [KQ]-x.-[TA3-x t 2i-[GA]-S-S-E-E-K 

[ l] Butter WT Connect Tissue Res. 23.1 23-36( 1985*1 
[ 2] Gorski J. P. Calcit. Tissue Int. 50:391-396(1992). 
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[ 3] Denhardt D.T.. Quo X. FASEB J. 7:1475-1482(1993). 
[1090] 382. Oxysterol-binding protein family signature 

A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found 
s [1] to be evolutionary related: 

Mammalian oxysteroi-binding protein (OSBP) A protein of about 800 a mi no-acid residues that bind? a variety of 
oxysterois oxygenated derivatives of cholesterol OSBP seems to play a complex role in the regulation of sterol 
metabolism. 

10 - Yeast proteins HES1 and KES1: highly related proteins of 434 residues that stem to play a rote in ergosteiol 
synthesis. 

Yeast OSH1. a protein of 859 residues that also plays a role in ergosterol synthesis. - Yeast hypothetical protein 
YHR001w(437 residues). 

Yeast hypothetical protein YHR073w {996 residues). 
*s - Yeast hypothetical protein YKRQOSw (448 residues). 

[1091] All these protein?, contain a moderately conserved domain of about 250 residues located in the C -terminal 

half of OBSP, OSH1 and YHR073w and in the central section of the other proteins. As a signature pattern, the best 

conserved part was selected of this domain, a region that contains a conserved pentapepttde 
20 [1092] Consensus pattern E-[KQ]-:<-S-H-[HR]-P-P->-[STACF]-A 

[1093] { 1] Jiang B.. Brown J L . Sheraton J.. Fortin N , Bussey H Yeast 10:341-353(1994) 

[1094] 383. FMN oxidoreductase 

[1095] 384. Oxidoreductase FAD/NAD-binding domain 

Number of members: 250 
25 m 

Medline: 82084635 

The sequence of squash NADH nitrate reductase and its relationship to the sequences of other flavoprotem oxidore- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM. Campbell W; 
30 J Biol Chem 1991 ,266:23542-23547. 

[2]Med!ine: 95111952 

Crystal structure of the FAD-contaimng fragment of corn nitrate reductase at 2.5 A resolution, relationship to other 
fiavoprotein reductases. 
Lu G. Campbell WH. Schneider G, Lindqvist Y; 
35 Structu re 1 994 .2: 809-82 1 . 

[10963 385. (oxidored motybj Eukaryotic molybdopterin oxidoreduetases signature A number of different eukaryotic 
o <tdoreduotases that require and bind a molybdopterin cefaclor have been shown [1 ] to share a few regions of sequence 
similarity. These enzymes are: 

40 - Xanthine dehydrogenase (EC 1.1.1.204). which catalyzes the oxidation of xanthine to uric acid with the concomitant 
reduction of NAD. Structurally, this enzyme of about 1300 ammo acids consists of at ieast three distinct domains: 
an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain (see <PDOCG0175>), a central FAD/NAD-binding 
domain and a G-terminai Mo-ptenn domain. 

Aldehyde oxidase {EC 1 2 3 1}, which catalyzes the oxidation aldehydes into acids Aldehyde oxidase is highly 

45 similar to xanthine dehydrogenase m its sequence and domain structure. 

Nitrate reductase (EC 16 6 1). which catalyzes the reduction of nitrate to nitrite Structurally, this enzyme of about 
900 amino actds consists of an N-terminal fylo-pterin domain, a central cytochrome b5-type heme-bindmg domain 
(see <PDOC00170>) and a C-termina! FAD/NAD-hmding cytochrome reductase domain. 
Sulfite oxidase (EC 18 3 1;. which catalyzes the oxidation of sulfite to sulfate. Structurally, this enzyme of about 

so 460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-pterin domain 

There are a few conserved regions inthe sequence of the molybdopterm-binding domain ofthese enzymes. The pattern 
used to detect these proteins is based on one of them It contains a cysteine residue which could be involved in binding 
the molybdopterin cofsctor. 

ss [10973 Consensus pattern: EGAJ-x{3HKRNQHT3-x{11 l 14HLiVMFYWS3-x{8)-[LfVMF3-x-C-x{2HD£N3-R-x{2)-pE] 
[1098] [1)Wootton J.C.. Nicolson R.E.. Cock J M.. Walters D.E., Burke J F. t Doyle W. A.. Bray R C. Biochim Biophys 
Acta 1057 157-185(1991). 

[1099] 386 (Oxidored qt) NADH-Uhtquinone/plastoquinone (complex I), various chains 
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This family is part of complex I which catalyses the transfer of two electrons from NAOH to ubiquinone in a reaction 
that is associated with proton translocation across the membrane. Number of members: 1824 

m 

Medline: 93110040 

s The- NADH ubiquinone oxidoreductase {complex It of respiratory chains Walker JE; 
Q Rev Biophys 1992:25-253-324 
[1100] 387. (oxidoted q3) NADH-ubiquinone/plastoqumone oxidoreductase chain 6 179 members. 
[1101] 388 (oxidored qo'j NADH-ubiquinone oxidoreductase chain 4, amino terminus 
[1102] (11 Walker JE ; Q Rev Stophys 1992,25:253-324 

■to [1103] 389 (oxdoted q6) Respiratory-chain NADI-t dehydrogenase 20 Kd subunit signature Respiratory-chain NADH 
dehydrogenase (EC 1.6.5 3) [1,2] iaiso known as complex I or NADH-ubiqumone oxidoreductase:) is an oligornenc 
enzymatic complex located in the inner mitochondrial membrane which also seems to exist in the chloroplast and in 
cyanobac.teria (as a NADH-plastcquinone oxidoreductase). Among the 25 to 30 polypeptide subunits ot this, bioener- 
getic enzyme complex there is one with a molecular weight of 20 Kd {in mammals) [3], which is a component of the 

*s iron-sulfur (IP) fragment of the enzyme. It seems to bind a 4Fe~4S iron-sulfur cluster. The 20 Kd subunit has been 
found to be: 



Nuclear encoded, as a precursor form with a transit peptide in mammals, and in Neurospora crassa - Mitochondrial 
encoded in Paramecium {gene psbGi. 

Chloroplast encoded in various higher plants (gene ndhK or psbG). 



The 20 Kd subunit is highly similar to [4]: 



- Synechocystis strain PCC 6803 proteins psbG1 and psbG2 

Subunit B of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoB). 
Subunit NQOo of Paracoccus denitriftcans NADH-ubiquinone oxidoreductase. 
Subunit 7 of Escherichia coll formate hydrogenlyase igene hycG.i. 
Subunit ! of Escherichia coli hydtogenase-4 igene hyf!) 



30 As as signature pattern a highly conserved region was selected, located in the centra! section oi this, subuntt and which 
contains a conserved cysteine that is probably involved in the binding of the 4Fe-4S center 

[1104] Consensus pattern: [GN3-x-D-[EASTHL!V'lV!Fp}-p.[IV}-D-[L!VMFY\M(25-x-P--x-C-P-[PT] [The C is a putative 
4Fe-4S ligand] 

35 [1] Regan C.l Curr. Top Bioenerg. 15: 1-36i 1987) 

[2] Weiss H., Priedrich T. Hofhaus G., Preis D. Eur, J. Biochem. 197:583-576(1991}. 

[ 3] Arizmendi J.M., Runswick M.J., Skehel J.tVL Walker J.E. FEBS Lett. 301:237-242(1992). 

[4j Weidner U.. GeierS., Ptock A. , Friedrich T, Leif H.. Weiss H, J, Mol. Biol, 233:109-122(1993). 

40 [1105] 390. p53 tumor antigen signature 

Ttie f b3 tuinot aniigen [Mob t1 E2] is : i f. toiein found in mete ased amounts in a wide variety of iransfonned cells 
It is also detertaole in many ptoliferoting nontransformed cHls but it is, undetectable ft present at bw IwHs m resting 
cells it is treouentiy mutated or inactivated in many type? of <.anoei p53 seems to act as a tumor suppressor m some, 
but pabably not all tunxt tvpes p63 is ptobahlv tn^lveo m tell evele i emulation and iruy be a ttans^ictK .ttor that 

■*s acts tu negativulv mqulate cellular division by controlling a set oi cienes required for this process. 

p53 is a phospnopiotetn cf about 3°0 amino and e which can be subdivided into four dc mains a higni\ chaiged acioic 
region of about 75 to 80 residues, a hydrophobic proline-nch domain (position 80 to 150). a central region {from 150 
io about 300 1 and a highly basic C-Utinmal region The sequence of p^3 is well cun^etved it! vtitebtate species 
attempts to identify p53 tn other eukaryotic philum nas so far been unsuccessful 

■io As a sicjnatuie pattern fot p53 a pwfectiy _x nerved stretch ot 13 residues located in the centtal legton otthe piot^tn 
a as selected Ihis tegRti known .jp domain IV in [3] is rmol^ed i.jiong *ith an .toiaient regions in the binding tfthe 
large T antigen of SV40 In man this tegton is the fot us of ,j vaitety of point mutations in can:etous tumots 
[1106] Consensus pattern M-C-N-S-S-C-M-G-G-M-N-R-R 



[ 1] Levine A.J.. Momand J.. Piniay C.A. Nature 351:453-456(1991). 
[23Le\ineAJ Momand J Btocnrm Biophys Acta 1032 Mt*-!36i lOud 
[ :} ^ou^t-i T - attn rv Fiomtrnt.il C ^> P Oncogen* ^ 945-f 5?i19y0) 
[ 4] Lane D P Benchtnx! S Genes De\ 4 1-<M<*90) 
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[ Sj Ulnch c J Anderson C W Mercet WE Appeiia E J Biol Chem 167 J520J-1 52G2( 1992} 
[1107] 391 tFVRi Delta 1-( yia line-5-caitx >vlafe it diKtase feignaHne 

Delta 1 -pvn oltne-S-carho^v'l-jt^ tedttctase ^F5CP) ^EC 1 5 1 2) [1 2] is the enzvme that catalyses the terminal step tn 
s the biosynthesis of proline from glutamatc the- N^D{Pi dependent ovidation of 1-pyrrolinc 5-carhovylatc into proline 
Tire sentience:: of P5CR fa m eubrtctena igene pa C ) atchaebaiteru jnd euk^rxotes show <. nl\ .t modeiate level of 
overall ■similarity As i siqn ituru pattern the best conserve region loeakd in the C-t^nninal suction ot P5CR <v?<s 

selected. 

[1108] Concensus pattern (PALf : |-x t > 3) ll !V j-xu^-jt.^ MJ-[&"!^C |-[c3 f V ]■ a-(GaN f G-\- ]'■*!,?)■] AGj-[l. IV) x<2)- 
« [LMF]-[DENQK] 

[ 1j Del^tin^ a J Vwma O P fVkl Gen wn<=t 221 29^-305^9^01 

[ >] Savior A Jeenes D J Kochei H P Ha.ts D Gene 86 10" 11 U !9Poi 

*s [1109] 3C*2 Poly-aden\ late binding protein unique domain 

[1110] ; j «3 tPAlt Phenylahnme md histidme immonia-lyases active site 

Phenylalanine ammonia Iwse (FC 4 3 16s (PAL) is a key enzyme of plant .tnd fungi phenvipropanoid metatolism 
v>nicn is involved in the biosynthesis ot a wide variety of secondary metabolites sncn as tlavanotds furanocoumarm 
phytoaleAins and cell v. a!! components These compounds have many important role*, in plant-, dunno, noimal giowth 
so and in tespun^i. iu iWWHunmenU! stre ;>s P^L catalysts tht feme. \' : il of an ammonia ojoup from phenylalanine k ft im 
trans-ctnnamate, 

Histidine ammonia-lyase (EC 4c tc) thistidasej catalyses the fust step in histtdme degradation the teinoval of an 
ammonia grout, from histidin<; to ptodu<.e uiocank acid 

The tv, r> tvpe a of en.:yms;s jrc functionally and sttucturally r-Hatud [1 ] They an* the only which att known to 

25 have the modified amino acid dehydroalanme (DMA) in their active site. A serine residue has been shown {2.5.4) to be 
the precursor ot this essential eiectrophiiic moiety The region around this active site residue is well conserved and 
can be used as a signature pattern. 

[1111] 2r>nsensus pjtk m G-E^TG]-[Lh/lViHSTGHAC]-P-G-[DH]-L->'-P-L-[SAj-^2)-[PA) [3 is the* active site residue- ] 

30 [1]hiyk,iRG Lambert MA Sex^mith fc Sadlei o J RayPN Mahupm L> J Mdnn«> k R J Bid Chem 

265:18192-18199(19901 

[2] Langer M„ Reck G.. Reed J.. Retey J. Biochemistry 33:6462-6467(1 994V 

[ 3] Schu^r B R^v I FEBS Lett C 40 2f2-254t 1 ^94 ) 

f 4j Tavloi R G MclnnesR R J Siol Chem 269 2T4' 3-2 ? 4 ,"7(1 994 1 

35 

[1112] 394. PAS domain 

-'- r AUTION This, family du^sj not l unently match all knc. wn e > tuples of P^S domains 
P-i^ motifs ^pp<-3r in ^tchaea enb^cteii3 3nd t^ukarsM ProhafcK' 
the most surprising identification of a PAS domain was that in 
40 EAG-like K+-channefs[1.3]. 
Number of members: 308 
[1] 

Medline: 97446881 

PAS domain S-boxes in rtatuea, bacterid and sensui foi oo/gen ano iedo^ 
45 Zhuiin IB. Taylor BL. Dixon R: 

Trends Biochem Sci i397 22 331~33o 
[2]Medline' 95275818 

1 4 A bfmcHna of \ he k active yellow piuk in : i cvtosulK photon; oe ptor unusual fold adivi site and ^'hiomcphoit; 
Borgstah! GE, Williams DR, Geteoff ED: 
so Biochemistry 1995:34:6278-6287. 

[3)Medline: 98044337 
PAS. a multifunctional domain family comes to light 
Ponting CP. Aravmd L: 
CurrBiol 1997;7:674-677. 
ss [1113] iPEPi Phoaph litdvlythanolamme-binainq ptotein f mitly aicmatur^ 

Mammalian phosphatidylethanoiamine-binoing protein (also knowns as babic c\-tosolic 21 hd proteinj is a 166 residue 
protein found in a variety of tissues [1]. It binds hydrophobic ligands. such as phosphatidylethanolamine. but also seems 
[z] to bind nudectidoi siKh as GTP and FMN it is bujg^led that if couid act in mombiano i^mod^ling during jrowlh 



160 



EP 1 033 405 A2 



and matutation This protein belongs to a famt!\ that also includes 

Orosophtla arUtnnal ptotein A5 a putative odorant-binding protein 
On:hocerca vokulus antigen 0-16 und the related f totems D1 D2 am D? 
s - Plasmodium falciparum putative pbosphatidyletbanolamine-bindmg protein. 

foxocara ctnis i.e<. reted antigen '![■>?■ 26 This larval piotein hat teen ■shown to bind phosph .ttidylethrtfx la mine 
Vast pro ft*in DK J 1 (alsoknos^n asNSPI orTFS1> Tfv function o! this protein is nuUerycltaE -> east hypothetical 
protein YLR179C. 

C aenorhabditts elegans hypothetical pi otetn ROA 3 3 

10 

As a ^tgnatute p litem thu bust conserved region s^-as selected i vhieh it locat-Ki in the smd of th> j first third of tht 
sequence ot these proteins. 

[1114} C ons-niUi. pattern [F Yi.j-x-3LvHUVf-j-x-3Tl v]-3DC }-P 0 < P-[SNJ-« IOi H 

*s [ 1j Seddiqt N., Bollengier F„ Alltel P.M.. Perm J. P., Bonnet F.. Bucquoy S.. Jolles P.. Sebaentgen F. J. Mol. Evol. 

39:855-660(1994). 

[ I } Schoentgen f : Jolles P FESS Lett 369 6(1 996 1 

[1115] 396. PCI domain 
so This domain has also been called the PINT mo tit i Proteasorne. 
int-6. Nip-1 andTRIP-15M1]- 
Number of members: 49 
E1] 

Medline: 98308842 
2& The PC! domain: a common theme in three muitiprotem complexes. 
Hofmann K, Bucher P: 

Tiendh Biochem Sci 23 r 04-20? 

[2]Medline. 98266368 

Homobgues of 26S proteasome subumts aie regulatots of ttanscnption and translation 

30 Aravtnd L. Pontinq CP: 

Protein Sci 1998:7:1250-1254. 
[1116] 397. (PCMT) Protein-L-isoaspartate t O-aspartate ) O-methyltransferase signature. Protein-l-isoaspartate (D- 
aspartate^ O-methyltransfeiase (EC 2 ! I 77 1 iPCMT^f I j iwhich is also known as L-ts<. aspartvl piotein carboxy! ineth- 
vitiansferrts^tsan en:: vme th.jt cat .tlyzes the transfer of a methyl ytoup fa mS-.tdenosy!inethioninetothefie<rc..i[i:o<yl 

35 groups otD-asp irfyl or L-tsua^partvl residues in a satiety of peptides and proteins Tht i*nzyrth* do.*s not icton normal 
I. -aspartyi residues t. isoaspartyl and D-aspart\l aiethe product? of the spontaneous de animation and/or isomenrratton 
of noimal L-asp=irtyl : md L-aspar : iginvl residues in (.toleins P»"MTpla\'i, a n - >le in the iv\ air and/oidegiadationcf these 
damaged proteins: the enzymatic methyl estertfication of the abnormal residues can lead to their conversion to normal 
L~aspartylresidues PCMT is a sveii-conserv'ed and widely distributed cytosoltc protein of about 24Kd As a signature 

40 pattern a x nerved ieyion in the central part of this fn:vme h^s b<=en developed 
[1117] Cun^nsus ( attem [GSAj-D-^->t.L , i-^-[|-Y , iA-\/]->t.3i-EAfaj-P-lFVi-LDNj-^t - 

[1118] [ 1) Kagan R M McPadden H.J.. McFadden P.N O'Connor C. Clarke S. Comp. Siochem Physiol. 117b: 
379-385(1 997!." 

[1119] y>H (PC NAi Piolifetating cell ntKle.ji antigen sionatutes 
45 Proliferating evil nuduai antigen (PCNA) [1 2] is a ptoiein invoked in DNA teplieation by acting <2h a cof ictor for DNA 
polymeiase delta, the polymerase responsible foi leading stiand DNA replication 

A similar protein exists in yeast (gene POL30) [3] and is associated with polymerase ill, the yeast analog of polymerase 
delta In ba:u!wiruses th^ ETL ptotein nas been shosvn [4] to b^ nighly lelated to PCNA and is ftobaoly ass^nateo 
with the \'ira! encoded DNA polymerase An nomolog of PC MA is also found in archebactena 
so As signatui^s toi this faintly of totems two conserved teyions s>eio selected located in th*» N-terinmal station Th<= 
second one has been proposed to bind DNA. 

[1120] Consensus puttetn [GA3-[LIVMF]-y-[LIV MAj-<-[SA\/3-[UVM1-D-x-[N5A&]-[Hh RHVI]-Y-[LV]-[VGA]-Y-[LIVMj- 
x-[LIVM]-x(4)-F 

ss - Consensus pattern: (RKA>C-{DEHRH3-x{3)-[LlVMF3-x{3)-[LlVM]-x-(SGAN3-[LlVMF3-x-K~[LfVMF3{2) 
[ l] Piavo R , Ft^nk R Blund^ll PA McDonald-Piavo H Natuie ,CC 515-St7 ( l"87) 

[ 2] ->uzuka I hi at j S Matsuoka M Kosuqi S Hashimolo J Eur J Biuch^m 105 571-5~5 t 199 1 ) [ 3] Bauet G 
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A . Burgess PM J. Nucleic Acids Res. 18 261-265(1990} 

[ 4] O'Reilly D.R.. Crawford A.M.. Miller L.K. Nature 337:606-606(1989). 

[1121] 399. fPDTS Prephenate dehydratase signatures 

Prephenate dehydratase iEC 4.2.1.51) (PDT) catalyzes the- decarboxylation of prephenate into phenylpyruvate in 
microorganisms PDT is involved in the terminal pathway of the biosynthesis of phenylalanine In some bacteria such 
as Escherichia colt PDT is part of a bifunciiona! enzyme i P-prote:intihal also catalyzes the transformation of chonsmate 
into prephenate (chonsmate mutase) whiie in other bacteria it is a monofunctiona! enzyme. The sequence of mono- 
functional PDT align we!! with the C-termmai part of that of P-proteins [1]. 

As signature patterns for PDT two conserved regions were selected The first region contains a conserved threonine 
which has been said to be: essentia! for the activity of the: enzyme; in E coii. The second region includes a conserved 
glutamate Both regions ate in the C-terminal part of PDT 

[1122] Consensus pattern: [FY^x-[LI\fl«l-x(2HLlVMl-x(5}-[DN3-x(5)-T-R-F : KLIVMW>-x-{LIVWi] 
[1123] i 1] Fischer R 3.. Zhao G . Jensen P A, J. Gen. Microbiol 137 1293- 1301{1 991} 
[1124] 400. PD2 domain (Also known as DHR or GLGF). 
[1125] PDZ domains are found in diverse signaling proteins. 
[1126] (1 i Ponting CP. Phillips C, Davies KE, Blake DJ 

Bioessays 1997.19 469-479. [2] Doyle DA, Lee A. Lewis J. Kun £, Sheng M. MacKinnon R; Cell 1996:85.1067-1076. 

[3] Ponting CP; Protein Sci 1997;6:464-468, 

[1127] 401. fPPDK_N_term} PEP-utilizmg enzymes signatures 

A number of enzymes that catalyze the transfer of a phosphoryi group from phosphoenolpyruvate (PEP) via a phospho- 
histidme intermediate have been shown to be structurally related [1.2.3,4] These enzymes are' 

- Pyruvate. orthophosphate dikinase (EC 2.7.9.1 } (PPDK). PPDK catalyzes the reversible phosphorylation of pyru- 
vate and phosphate by ATP to PEP and diphosphate. In plants PPDK function in the direction of the formation of 
PER which is the primary acceptor of carbon dioxide in C4 and crassulacean acid metabolism plants In some 
baotena, such as Bacteroides symhiosus, PPDK functions in the direction of ATP synthesis. 
Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate, water dikinase) This enzyme catalyzes the reversible 
phosphorylation of pyruvate by ATP to form PEP, AMP and phosphate, an essential step in gluconeogenesis when 
pyruvate and lactate are used as a carbon source. 

Phosphoenolpyruvate-protein phosphotransferase (EC 2 7 3 9). This is the first enzyme of the phosphoenolpyru- 
vate-dependent sugar phosphotransferase system (PTSj. a major carbohydrate transport system in bacteria. The 
PTS catalyzes the phosphorylation of incoming sugar substtates concomitant with their translocation actoss the 
cell membrane The general mechanism of the PTS is the following a phosphoryi group from PEP is transferred 
to enzyme:-! i El) of PTS which in turn transfers it to a phosphoryi earner protein ( HPr). Phospho-HPrthen transfers 
the phosphoryi group to a sugar-specific permease, 

All these enzymes share the S3me catalytic mechanism, they bind PEP and transfer the phosphoryi group from it to a 
histidme residue The sequence around that residue is highly conserved and can be used as a signature pattern for 
these enzymes As a second signature pattern a conserved region was selected in the C-terminal part of the PEP- 
utilizmg enzymes The biological significance of this region is not yet known 

[1128] Consensus pattern G-[GA]-x-[TN]-.<-H-ESTA3-(STAV3-ELIV!V!3(2}-E&TAV3-(RG] EH is phosphoryiated] 

- Consensus pattern: [DEQSKj-x-[LI\MF>S-[LIVMF>GH£T3^^ 

[GAS]-x(2)-R 

E 1 j Retzer J.. Hoischen C, ReizerA , PhamTN., SaierM H. Jr Protein Sci. 2S06-521(1S93). 

E 23 Reizer J , Reizer A., Merrick M J , Plunkett G 111, Rose D J., Saier M H Jr Gene 181:103-108(1996) 

[3j Pocalyko D.J., Carroll LJ., Martin B.M., Babbitt PC, Dunaway-Mariano D. Biochemistry 29:10757-10765 

(1990). 

[ 4] Niersbach M , Kreuzaler F , Geerse R H , Postma P., Hirsch H J. Mol Gen. Genet 232:332-336<1992) 
[1129] 402 (PEPCK ATP) Phosphoenolpyruvate carboxylase {ATP} signature 

Phosphoenolpyruvate carboxylase (ATP) (EC 4.1.1.49) (PEPCK) [1 ]cataiyzes the formation of phosphoenolpyruvate 
by decarboxylation of oxaioacetate while hydro! yzmg ATP a rate limiting step in gluconeogenesis (the; biosynthesis of 
glucose). 

The sequence of this enzyme has been obtained from Escherichia coii. yeast, and Trypanosoma brucei, these three 
sequences are evolutionary related and share many regions of similarity As a signature pattern a highly conserved 
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region was selected that contains four auaic residues and u\hich is bcated in the central cart of the enzyme The 
beginning of the pattern is located about 10 residues to the C-termtnus of an ATP-bmdtng motif 'A - {P-!oop) (see 
cPDOCOOOt?:-) =md is, =ilsc f.ar1 oMhe ATP-bmdino, doimiin [z] 
[1130] Consensus pattern L-!-G-D-D-E-H-y-VV-y-[D£3-x-G-[[\'1-x-M 

s 

Note phospho^nt lp\iuvate carbox\ Mn.jse (OTP) {£(. 4 1 1 :XT>an en.Tvme that ojt.jiyzes the same reaction but 
usino GTP instead of ATP is not rcl<itt?d to thu at.ove -'fizvnuj (set --PDOC00421 - 1 

[ 1] Medina V,, Pontarollo R . Gfaeske D„ Tabei hi. Goidie H. J. Bacterid. 172:7151-7156(1990). 
10 [I] Matte A GoldieH Svveet R M Delbae f el_TJ J Moi Biol 258 12^-1 43(1 9<-j6 i 

[1t31] 403 (P^pc^e) PliObptKcnoipvitvate catboxj-hst active bites PliObpho^nolpytiwte carboxylase ^EC 
4 ! ! M» {PFPta^e) cjt.jiy.Tes the itn^frtiible teta-caiho^ylation of phospho?n<.ip\iuvate by bic.jibon.ite to yield 
oyaloacetate am pnosphate Tne en:: i ( me is found in all pLmts and in a ^ arietv of muiootganisms A nistntn^ [1] am 
»5 a lysine [2] have been implicated in the catalytic mechanism of this enzyme; the regions around these active site 
residues are highly conserved in PEPcase from various plants, bacteria and cyanobactena and can be used as a 
signature patterns for this fvpe of enzyme. 

[1132] Consensus pattern [VT3-v-T-A-H-P-T-[EQ3-\i -i-P-(KRH] [H is an active site tesidue]- 
Gonsensus pattern [IV] M-[UvM]-G C-S ■ <-K-D-[STAG]-G [K is an active site residue]- 
2S [1133] [1]Teradah l"ui K Em J Bioche ni 202 797-303t,1991 } [ 2j liao J -A Pcde<.taFE Choite( R G'Lear) 
MH AmreoCS Bio:him Biophys Act j 1041 291-205(1990) 
[1134] 404 PET 112 family signature 

The follOvMng totems fioin fetikatyotes ptok^otes ^nd aichaeba<.ten=i L^lon^ tothf same t^inik 

25 - Yeast mitochondria! protein PET112 [1]. which plays an unknown rote in the expression of mitochondrial genes. 

probably at the level of translation 

Aspergillus nidulans mitochondrial protein nempA, 

Bacillus subttlis hypothetical protein yzdD. 

Mora^dla catat mails hypothetical protein in bloR-1 3'region 
30 - Mycoplasma genifalium hypothetical protein MG100. 

Methanococcus lannaschti hypothetical proteins MJ0019 and MJ0180. 

The sizcofthHSH totems range from -It 9 to G30 amino acids A? a signatui* 1 pattern ? consei^ed ft^ion looted in 
the N -terminal section was selected 
35 [1135] Consensus pattern [DN]-A-[DN]-R-A(3VP-L-[LI\/3-E-[LiV]-y-ESTj-vP 

[1136] [ t] Mulejo J J Rosenthal J r f o* YD Cuir Genet 2"i 2&9- 304(1 & 94) 
[1137] 40=) (FFK) Phosphofiuctekina<.e ^ign=itm«.- 

Phospnotiut tokinase (EC 2 ~ ! 11) (PFKt [ ! 2] is a key regulatory enzyme in the qK" olytr patho av It c^Ulyzes the 
phosphotylation b\ ATP ot fructose 6-phosphate to fructose 1 6-oisphospnate in bacteria PFK is a tet tamer of identical 

40 3o Kd stibunits In mammals it is i tetiamei of 80 Kd subumte Each 80 Kd subunit consibUftwo homologous domains 
which afe highly lelateo iu the baotenal 36 Kd subumts in Human the it- ar-i thit;e tissue -^peoifio tvpes of PFK iso- 
zymes: PPKM (muscle). PFKL (liver), and PFKP (platelet! In yeast PFK is an octamer composed of four 100 Kd alpha 
chains (gene Pf-'M t and four 100 Kd beta chains (gene PF K'?t like the mammalian 8o Kd i-ubuntts the veast V\> Kd 
subunrh. are i.ompoi,ed otUo homologous domains 

45 As a signature pattern for PFK a region that contains thrive h<3sie residues trwolvta in fmetossi-O-phosphfte bindinq 
was selected. 

[1138] Consensus pattern (RK|-\t4VG-H-x-Q-(OR) G-G-A<e)-D-R [ l he R/K the H and the O'R are invoked in fiuc- 
tose-6-P binding] 

so - rkte E^ch^rkhia xli ha*, tvo phosphofitktokinase i^oz/i nes *hi<.h ^ie encoded by genes pftA ^inajoi tand ptkB 
(minor) I he pfht-? ii07yineis not evolutional \ lel.jtedtoothei piokaryoticoreukai\otk Pf-'K i, tsee^PDOC 00^04>1 

[ 1] Poorman P A Randolph A Kemp R G Hemnkson R L Nature 309 46"Mo0( 1 %4 1 

[?jHeints<.hJ RifcrelftG ^on borstel R C Aguilera A Podi^io P Zimmerman n f- 1- Gene ; % 309- S : Id^P) 

ss 

[1139] 400 (PGAM) Pnosphoglycerate mutase family phosphohistidine signatuie 

PhOfephoglvcHi^te mutat.* 1 iFC 5 4/ I ) i PGAM ) ?n<i L iiphosphoglvcefatn mutase (FC f" 4 / 41 iBPGM i ^ie ^tiucturalH 
it-l : iUd -in2ym-is vvhith catai>^e feacticns irivolvinq the tfansf^r of phospho grout, s be twee n the thit-e carbon atoms 
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of phosphoglycerate [1 ,2], Both enzymes can catalyze three different reactions, although in different proportions: 

The isomermstion of 2-phosphocilycerate (2-PGA) to 3-phosphoglyceiate (3-PGA) with 2,3-diphosphoglyeetste 
(2.3-OPG) as the primer of the reaction. 
s - The synthesis of 2.3-OPG from 1 ,3-DPG with 3-PGA as a primer 

- The degradation of 2,3-DPG to 3-PGA (phosphatase EC 3.1.3.13 activity). 

in mammals. PGAM is a dimenc protein. There are two isoforms of PGAM: the M (muscle! and B (brain) forms, in 
yeast PGAM is a tetramenc protein, 8PGM is a dimenc protein and is found mainly in erythrocytes where it plays a 
to maior tolt; in regulating hemoglobin o>ygen affiruh' Ah : i consequence of >'ontiollinq 2 VDPG .xncenl ration 

The cataKtK m-»ch mismof both PGAM and BPGM invoke s the formation of 3 phosphohisitdin.* int-»riwdi*tt; [33 
The Afunctional enzyme 8-phosphofructo-2-kmase / fructose-2.6-b!sphosphatase (EC 2.7.1.105 and EC 3.1.3.46) 
iPf-'2Kt [4] crtte!\::es both the * ynthe* !■» ana the degiadation of truck se- 1 6-hi':.phos:ph.jte PVV.h is an important en- 
zyme in the regulation of hepati: raibolmrate metabolism Lil-^ PGAM'SPGM tne fnictose-2 O-bi^phosphatase re- 
's action involves a phosphohistidme intermediate and the phosphatase domain of PF2K is structurally related to PGAM; 
BPGM. 

Ihe hactena! enz\me alpha-nba^ole-^'-piioi,ph^te ptxschatase tyene >.obC ) which is involved in cobjUnun biosyn- 
thesis also belongs to this family [5], 

a signature pattern was built around the phospholnsridine residue 
20 [1140] Consensus rat^m [LlvM]->-R-H-G-[E* j]-x(3Vfi [H is the phosphohistidme residue] 

Note: some organisms harbor a form of PGAM independent of 2,3-DPG. this enzyme is not related to the family 
described above [6]. 

25 [ 13 Le Bouich P.. Joulin V.. Garel M.-C Rosa J., Cohen-Solal M. Biochem. Biophys. Res. Commun. 158:874-881 

(1988). 

f 2] White M F Fofheraill-Giimotsr- 1. A FF.BS Lett :29 3y3-38 _ { 19881 
[3] RoseZ.B. Meth. Enzyme!. 87:43-51(19821 

[4jbazanJF Fletterick R J Pill is S J Proc Natl Acad Set USA eo 964':-9e4w 
30 [ t>] Oloulo G A ! tzebiatowski I R E^cfjlante-S^nierena J f J biol Chem 269 26^03-26 "> 1 1 < 1 994 1 

[63 Grana X., De Lecea L. El-Maghrabt M.R.. Urena J.M., Caellas C. Carreras J,. Putgdomenech P., Ptlkis S.J.. 
Ciiment F. J, Biol. Chem. 267: 1 2787-1 2803f 1 992 ). 

[1141] 407 (PGh Phtschoqiucoie istmerrtse signatuiei 

35 Phosphogiucos-* isonmrjs- 1 (EC 5 3 1 Ch (FGi t [1 2] is 3 dune nc enzyme thJt r;.{;siyzes th-» re v* tsihlu tsomen::ation of 
glucoses-phosphate and fructcse-C-phosphate PGI is involved m different patrways in most higher organisms it is 
involved in glycolysis: in mammals it is involved in gluconeogenesis: in plants in carbohydrate biosynthesis: in some 
ha< terw it ptovides ^ oateway fen fiuctose into the Cntner-Dnudnuroff c athwav PGI h3t> he<-n shown [3] tn be identical 
to neutoleukin a neuiotrophic factoi which bunports the survival of various types of neurons 

40 The sequence of PGI from many sp^ies i^ngmn from bactfti? to mammals is availaLle and has b<=<;n shown to 
hiqhly ocnservtid As sign : itutt; \ attems foi this, ^nzynn 1 twe oenserved legions were selected thelirsticgion is located 
in the centra! section of PGI. while the second one is located in its C-termtna! section. 
[1142] Concensus pattern [E)£NS]- a-(I.IVM) G-G-R-[F r] S-fJ.IVM'i] <-[SlA] [PSACj [L IVM<\j-7 

45 - Consensus pattern: [GS3-x~[L!VM3-[LIVMFYW3-xf4HFYHDN3-Q-x-G-V-e-x{2)-K 

[ 1] Achari A,. Marshal! S E., Muirhewad H.. Patmien R H.. Noltmann E.A. Philos, Trans, R, Soc, Lond.. B. Biol. 
Sci. 293:145-157(1981). 

[2] Smith M.W.. Doolittie R.F. J. Mol. Evol. 34:544-545(1992"). 
so [ C 3 Faik P Vvalk^r J I H Redmill A A M Motgan fvl J N^tuie 332 45:>-45G(1£S8> 

[1143] 408. fPGK) Phosphoglycerate kinase signature 

Phcspnogiycerate kinase tEC 2 7 2 3t (PGKt [1] catalyzes the second step in the second phase of glycolysis the 
reversible concision of 1 dtphosphogivoeiate to 3 phosphoo;!>cei ate with generation of one molecule of AI'P PGK 
55 is found in ail living organisms and its sequence has been highly conserved throughout evolution, it is a two-domain 
ptotetn each domain is competed of sr* repeats of an alpha/beta structuial motif As a signature pattern fot PGK's a 
consened i^gion in the N-tenninti! legion was ^ein-ted 

Consensus pattern: [KRHGTCVN]-fVT3-[L!VMF3-[LIVMC3-R-x-D-x-N-[SACV]-P 
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[1144] [ I] Watson HC Littlechild J A Btocnem Soc Trans 13 1b~-190(lUstO> 

[1145] 40" i^PGM PMM) Phosphoglux mutate and ptKS|.homannomutase phobphosftme siunatuie 

Phospfngluj^mutase (EC S 4 Z C)iPGM; PGM is an enzyme responsible forth*- conversion of O-glucose 1-phos- 
pnatc tnto D glucose 6 ohosphate PGM participates in both tnc breakdown and s\ntncsis of glucose [1j 
Phosphomannomutase <C:C o 4 V. iPMM) PMM ii an en.Tvme iespt nt ible tu the convei sit n tf D-mannose 
1-phosphatt* into D-mannos*- o-pho^phatf- PMM is r> j qun<L-d for different hiosynthutic pathways in baoitjiia Foe 
example, in enterobactena such as Eschenchiacoli there are two different genes coding for this enzyme: ffbh 
which is involved in the synthesis of the O antigen of hpopoiysacchande and cpsG which is required for the synthesis 
of tht; M antigen cat-suiar polysaccharide [2] In Fst;ud<monas atiiuginosa PMM (gene algOt is invoked in tht 
biosuittKL-sisotth^akunate I iy«.-r [3j and in ^ anthumonasfainp> J &tristg<L-ne ^anA) it is involved in tht- biosynthesis 
<J -canthw [4j In Phcobium strain nui2o-l (yene noek} it is invoked in the bn. synthesis oftht nodfactot 
Phosphortcetyla.lucos.jmine mutase (£■(. 64? 5i whkh concerts N at et\ I- D- glucosamine 1 -phosphate ink the 
6-phospnate isomer: 

The catalytic mechanism of both PGM and PMM involves the formation of a phosphosenne intermediate [1]. The 

sequence around the setine tesidue is well <. onsen- ed and c.jn be used 3* .t signature pattern 

in addition to PGM and PMM there are at least three uncharactenzed proteins that belong to this family [5.8]: 

Urease operon protein ureC from Helicobacter pylori. 
Escherichia colt protein mrsA, 

Pataineciumtetrauielia parafusm a ohosphoglycoprotein invoKed in evocytosis 

A Methanococcus vanmehi hypothetical protein in the 3 region of the gene for nbosomal protein S10. 

[1146] Consensus pattern [GSAHL!VKi]-v[L!VMHSTHPOA3-S-H-v-r-A(4)-(GNHE3 is the phosphosenne resi- 
due] 

Note: PMM from fungi do not belong to this family: 

[ 1]L>atJb Liu / RavWJ Ji Konnc M J bid Chem 26 ' 6322-633/. 1992 t 

[ Z] 3te^ens?n G Lee S J Romanes L K Reeves PR Mol Gen Genet 22" !73-180i 1091) 

[33 Zleiinski N.A.. Chakrabarty A.M., Berry A. J. BioL Cbem 268:9754-9763(1991). 

[■IJKoeplmR Arnold V\ Hoettt B Simon R V\-ang G Puehi^rA I Bacterid 1~4 191-1 9^( 1902) 

[ 5] Bairoch A. Unpublished observations (1993). 

[6jSuhrank3nianSV Wvrob i E Andersen A P S&tir E? H Fro<- Nati tod USA S8r-3fe3C(19<Mi 
[1147] 410. PH domain profile 

The 'pleckstnn homology iPHt domain is ,3 domain of ^hout 100 residues th^t nc< uis in ^ wide rang-* of pmt<-ins 
invoked in intracellular signaling ot as constituents of the cytoskeleton [1 to 7] 

The function of this domain is notcltar several putative functions ha\e bf^n suggested - binding to thf beta'gamma 
subunit of heterotnmenc G proteins. 

binding to lipids, <? cj phOi,phatidylinositol-4,6 bisphosphate 
binding to phosphorviated Ser/l'hr residues, 
attachment to m> j mbian> j s b«, an unknovi n mechanism 

It is possible that different PH domains have totally different ligand requirements. 

Trn» 3D itructutt; of s« m! PH dc main 1 , has be^n de U nnirn»d [83 All Known cas^s have : i >x mmori structure (.onsistirig 
of tv\o perpendicular anti-parallel beta sheets followed oy a C-terminal amohipathic nelK The looos connecting the 
beta-strands differ greatly in length, making the PH domain relatively difficult to detect. There are no totally invariant 
residues within the PH domain. 

Ptoteins reported to contain one nr?re PH domains belong to the Hiov,ing families 

Pieckstrm the protein wrier*- this domain was fiist detected is the majoi suhstiate of piotem kinase C in platelets 
Plt-ckstnn is on*- of tht- raff- pterins to contains Ivo PH domains 

Set'Thr protein kinases such as tne & ,ct/Rac family the beta-ad lenergic receotor kinases the mu isoform of PKC 
and the trypanosomal NrkA family. 

Tyiosiru : ' pic U in kina*^ b^loriging to Ihe Btk/ltk Te c subfamily 
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- insulin Receptor Substrate 1 (IRS-U 

Regulators of small G-proteins like guanine nucleotide releasing factor GNRP (Ras-GRF) (which contains 2 PH 

domains.), guanine nucleotide exchange proteins like vav, dbl, SoS and yeast CDC24. GTPase activating proteins 

like rasGAP and 8EM2/IPL2, and the human break point cluster protein bcr. 
s . Cytoskeletal proteins such as dynamin (see --PDOC00362>), Caenorhabditis elegans kinesin-like protein unc- 1 04 

{see <PDOC00343>), spectrin beta-chain, syntrophin {2 PH domains) and yeast nuclear migration protein NUM1. 

Mammalian phosphatidyiinositoi-specific phosphoiipase C (Pi-PLC) (see <PDOG5000?>) isoforms garnrna and 

delta Isoform gamma contains two PH domains, the second one is split into two parts separated by about 4Q0 

residues.. - Oxysterol binding proteins OS8P. yeast OSM1 and YHR0/'3w. 
to - Mouse; protein citron, a putative rho/rac effector that binds to the GTP-bound forms of rhc and rac. 

Several yeast proteins involved in cell cycle regulation and bud formation like BEM2. 8EM3, 8UD4 and the 

BEMl-binding proteins BOI2 (BEB1 ) and BOH (BOB1 ). - Caenorhabditis elegans protein MIG-10. 

(Caenorhabditis elegans hypothetical proteins CQ4D8. 1, K06H?" 4 and ZK832.12 

- Yeast hypothetical proteins. YBR129c and YHR155w. 

?5 

The profile for the PH domain, which has been developed by Toby Gibson at the EMBL f covers the total length of 
domain Several proteins, contain large insertions, in the PH domain and are thus, difficult to detect with this, profile. In 
some of these cases, the profile wiii align only to one half of the PH domain. 

so - Sequences known to belong to this class detected by the pattern: ALL But it should be noted that while all se- 
quences containing PH domains are detected, not all PH domains are. Some of the split domains lie below the 
cutoff threshold. 



[ 1] Mayer B J., Ren R : Clark K.L., Baltimore D. Cell 73:629-630. 1993) 
[2] Haslam R.J., Koide H.8., Hemmmgs B.A. Nature 363:309-310(1993), 

[ 3j Musacchio A , Gibson T.J.. Rice P., Thompson J., Saraste M. Trends Biochem. Sri. 18:343-348(1 983) 
[4] Gibson TJ., Hyvonen M, Musacehio A , Sarasle M . Birney E. Trends Bioohern Sci. 19:349-353(1994) [ 5] 
Pawson T. Nature 373:573-580(1 995).[ 6] Ingley E., Hemmings B.A, J. Ceil. Biochem. 56:436-443(1994).!; 7jSar- 
aste M„ Hyvonen M, Curr. Opln. Struct. Biol. 5:403-408{1995).[ 8] Riddihough G. Nat Struct Biol. 1:755-757 
(1994). 



411. PHD-finger 
[1] 

Medline. 95216093 

35 The PHD finger: implications for ehromatin-mediated transcriptional regulation. 
Aasland R, Gibson TJ, Stewart AF; 

Trends Biochem Sci 1995;20:56-59. 
Number of members: 181 

[1148] 412. (Pl-PLC-X) Phosphatidyiinosltol-specific phosphoiipase C profiles Phosphatidylinosltol-specific phos- 
40 pholipase C (EC 3. t.4 11). an eukaryotic intracellular ensyme, plays an important role in signal transduction processes 
It catalyses the hydrolysis of 1-phosphattdyl-D-myo-inositol-3.4.5-thphosphate into the second messenger mole- 
cules diacylglycerol and inosrtol-1 : 4 : 5-triphosphate This catalytic process is tightly regulated by reversible phosphor- 
ylation and binding of regulatory proteins [2 to 4] 

In mammals, there are at least 6 different isoforms of Pl-PLC, they differ in their domain structure, their regulation, and 

45 their tissue distribution Lower eukaryotes also possess multiple isoforms of Pi-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-bov' and 'Y-boy' The order of 
these two regions is always the same (MH2-X-Y-CGQH), but the spacing is variable, in most isoforms. the distance 
between these two regions is only 50-100 lesidues but in the gamma isoforms one PH domain, two SH2 domains, and 
one SH3 domain are inserted between the two PLC-specific domains The two conserved regions have been shown 

so to be important for the catalytic activity At the C -terminal of the Y-boy, there is aC2 domain (see <PDOC00380>) 
possibly involved in Ca-dependenf membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotic and 
trypanosome Pi-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukaryotic counterparts. 
55 Two profiles were developed, one covering the X-box. the other the Y-boy 

[ 1] Meldrum E . Parker P. J.. Carozzi A Bioohim. Biophys Acta 1092:49-71(1991) [ 2] Rhee S.G., Choi K D Adv 
Second Messenger Phosphoprotetn Res 26' 35-61 (1992} 
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[ 3] Rhee S.G., Choi K.D, J, Biol. Chem. 267:12393-12396(1992). 

[ 4] Sternweis P C. Smrcka A V. Trends Biochem. Sci 17:502-506(1992) 

[1149] 413 (Pi-PLC-Y) Phosphatidylrnositoi-specific phospholipase C profiles 

Phosphatidylinositohspe-cific phospholipase C i£C 3 1.4.11), an eukaryotic intracellular enzyme, plays an important 
role tn signal transduction processes [ 1 ) It catalyzes the hydrolysis of 1 -phcsphatidyl-D-myo--inositol-3A5-triphosphate 
into the second messenger molecules diaeylglyceroi and inosito!-1.4.5-tnphosphate This catalytic process is tightly 
regulated by reversible phosphorylation and binding of regulatory proteins [2 to 4]. 

In mammals, there are at least 6 different isoforms of Pi-PLC, they differ in their domain structure, their regulation, and 
their tissue distribution Lower eukaryotes also possess multiple- isoforms of PI-PLC 

All eukaryotic Pl-PLCs contain two regions of homology sometimes referred to as 'X-box' and 'Y-box' The order of 
these two regions is always the same (NH2-X-Y-COOH!, but the spacing is variable. In most isoforms. the distance 
between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and 
one 8H3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown 
to be important for the catalytic activity: At the C-termina! of the Y-box, there is a C2 domain (see <PDOC00380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur aiso in prokaryotic and 
trypanosome Pl-speclfic phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukarvotic counterparts. 

Two profiles were developed, one covering the X-bo<, the other the /-box 

[ 1j Meldrum E , Parker P.J.. Carozzi A Blochim. Biophys Acta 1092.49-71(1991) [ 2] Rhee S.G., Choi K D Adv 

Second Messenger Phosphoproteirt Res 26:35-61(1992} 

[ Sj Rhee S G , Choi K.D J Biol. Chem 267:12393-12396(19925. 

[43 Sternweis PC, Smrcka A.V. Trends Biochem, Sci. 17:502-506(1992). 

[1150] 414. (PK) Pyruvate kinase active site signature 

Pyruvate kinase (EC 2 7.1.40) (PK) [1] catalyzes the final step in glycolysis, the; conversion of phosphoenolpyruvate 
to pyruvate with the concomitant phosphorylation of ADP to ATP. PK requires both magnesium and potassium ions for 
its activity. PK is found in all living organisms. In vertebrates there are four, tissues specific, isozymes 1 L (iivert, R (red 
cells! M1 {muscle, heart, and brain), and Mz (earlv fetal tissues). In Escherichia coli there are too isozymes: PK-I 
{gene pykPj and PK-I! (gene p>kAi Aii PK isozymes seem to be tetramers of identical subunits of about 500 amino 
actd residues. 

As a i i^nature pattern for PK .t conserved region was selected that includes a lysine residue which seems to be the 
icid'base c it ilvst rt; sponsible; for the; interoonversiun of pyruvate and enolpyruvate and a glutamic acid residue im- 
plicated in the binding of the magnesium ion. 

[1151] Consensus pattern Et-l\AC3-n-[LlvM3(2HSAPCV3-K-ELiV]-E-[NKRST]->c-[DEQHS3-[GSTA3-[LIVM] EK is the 

active site residue} EE fs a magnesium ligand] 

[1152] PJMunheadH Biochem Soc Trans 1S Iy3-1fr0t 1900) 

[1 553] 415 (PLDct Phosphoiipa^ D Acti\e site nxtif 

Phosphatidylcholine-hydroiyzing phospholipase L) (PLD) isoforms are activated by ADP-nbosylation factors (ARt-s). 

PLD produces phosphatide acid from phosphatidylcholine, which mav be essentia! for the formation of certain types 

of transport vesHes oi ma> be constitutive- vesrculai transport to signal transduction pathways 

PC-hydrolyzrnu. PLD it a hunologue of *.atdioliprn synthase phosphatidyl enne synthase bacterial PLOs and viral 

proteins. 

Each r*t these appears to possess a domain duplication which is apparent by the presence of two motifs containing 
well-conserved histrdrne. lysine, and/or asparagine residues which may contribute to the active site, aspartrc acid. An 
E coli endonuciease fnuct arid similai pre te ins appear to be PLC homologues but possess on!> one oMhese motifs 
Tne profile contained here represents only the putative acti\ e site regions since an accurate multiple alignment of the 
lep^at units has not bee:n achre\ed 
Number of members: 139 
[1] 

Medline: 96303814 

a novel famrl> of phospholipase D homologue-s that includes phospholipid synthases and putative endonucieases: 
identification of duplicated repeats and potential active site residues. 
Ponting CP Kerr ID: 

Piotem Sci I99o 5 "14-9/2 

EzlMedltne 9^3o42<-n 
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A duplicated catalytic motif in a new superfamily of phosphohydroiases and phospholipid synthases that includes pox- 
virus envelope proteins. 
Koonin EV; 

Trends Biochem Sci 1996:21:242-243. 
s f3]Med!ine: 94327597 

Cloning and expression of phosphaiidylcholine-hydroiyzing phosphoiipase D from Ricinus communis L. 
Wang X, Xu I, Zheng L: 
J Bioi Chem 1994;269:20312-20317. 
[4]Me<ilirte: 97386825 

to Regulation of eukaryotic phosphatidylinostiol-speerfie phosphoiipase C and phospholipase- D 
Singer WD, Brown HA, Sternweis PC: 
Annu Rev Biochem 1997:66:475-509. 
[1154] 416. (PMl typel) Phosphomannose isomerase type i signatures 

Phosphomannose isomerase (EC 5.3.1 8} (PMl) [1.2] is the enzyme that catalyzes the interconverston of mannose- 
»5 6-phosphate and fructose~8-phosphate. In eukaryot.es, it is involved in the synthesis of GDP-mannose which is a con- 
stituent of N- and O-iinked glycans as well as GP! anchors. In prokaryotes. it is involved in a variety of pathways 
including capsular polysaccharide biosynthesis and D-mannose metabolism. 

Three classes of PMl have been defined on the basis of sequence similarities {1 ]. The first class comprises ali known 
eukaryotic PMl as well as the enzyme encoded by the manA gene in enterobacterial such as Escherichia coli Class I 
so PMl's are proteins of about 42 to 50 Kd which bind a zinc ion essential for their activity. 

As signature patterns for class I PMl. two conserved regions were selected. The first one is located in the N-termina! 
section of these proteins, the second in the C-terminal half Both patterns contain a residue invoiced [3] in the binding 
of the zinc ion. 

[1155] Consensus pattern Y-x-D-x-N-H-K-P-E [E is a zinc ligand] 

- Consensus pattern: H"A-Y~[LIVM]-x-G-x(2HLIVM3-E-x-M~A-x-S-D-N-x-(LiVM]"R"A-G"X"T~P-K [H is a zinc ligand] 

[ 1] Proudfoot A.E.I,, Turcatti G., Weils T.N.C., Payton MA, Smith D.J. Eur. J. Biochem. 219:415-423(1994). 
[2] Coulm F., Magnenat E., Proudfoot A.E.I., Payton MA, Scuiiy R. Wells T.N.C. Biochemistry 32:14139-14144 
30 (1993). 

[ 3] Cieasby A.. Wonacott A : Skarzynski T : Hubbard R.E , Davies G.J., Proudfoot A E.I : Bernard A.R.. Payton 
MA. Weiis T.N.C. Nat. Struct Biol. 3:470-479(1996). 

[1156] 417. (PNP UDP 1} Purine and other phosphorylases family 1 signature 
35 The following phosphorylases belongs to the same family: 

Puiine nucleoside phosphoiylase (EC 2 4 2 1 ) (PNP) from most bactena (gene deoD) This, enzyme catalyzes the 
cle3V3ge of guanosine or inosine to respective bases and sugar- 1 -phosphate molecules [1 j. 
Uridine phosphorylase (EC 2 4 2 3} (UdRPase) from bacteria (gene udp} and mammals. Catalyzes the cleavage 
40 of uridine into uracil and ribose-1-phosphate. The products of the reaction are used either as carbon and energy 

sources or in the rescue of pynrnidine bases tor nucleotide synthesis [2j. 

?'-methylthioadenosine phosphorylase (EC 2 4 2 23} (MTA phosphorylase) from Sulfolobus solfataricus [3] 

As a signature pattern, a conserved region was selected in the central part of these enzymes. 
45 [1157]" Consensus pattern [GST]-y-G-[LIVM3-G-x-[PA3-S-x-[GSTAj-i-x(3t-E-L 

Note 1 it shoudi be noted that mammalian and some bacterial PNP as well as eukaryotic MTA phosphorylase belong 
to a different family of phosphorylases (see <PDOC00954>i. 

so [1] Takehara M., Ling R, Izawa S., Inoue Y.. Kimura A. Biosci. Biotechnol. Biochem. 59:1987-1990(1995). 

[ 2] Watanabe S.-l., Hino A., Wada K„ Eliason J.F., Uchida T. J. Biol. Chem. 270:12191-12196(1995}. 
[ 3] Cacciapuoti G , Porcelii M . Bertoldo C, De Rosa M : Zappia V. J. Biol Chem. 269 24752-24769(1994} 

[1158] 418. (PP2C) Protein phosphatase 2C signature 
ss Pfotein phosphatase 2C (PP2C) is one of the fouf rnajof classes of mammalian senne/threomne specific, protein phos- 
phatases (EC 3 1 3.16) PP2C [ 1 ] is a monomenc enzyme of about 42 Kd which shows broad substrate specificity and 
is dependent on divalent cations (mainly manganese and magnesium) for its activity. Its exact physiological roie is stiii 
unclear Three isozymes aie currently known in mammals PP2C-alpha, -beta and -gamma. In yeast, there are at least 
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four PP2C homologs phospnatase PTC1 [2] \-hich has weal, tyrosine phosphatase activity m addition to its activity 
on Sfeiinfs phosphatases. PTC2 and PT» 3, and hypothetical ptotein VBR12E"- Isozymes of PP2C 3i*> also kno^n 
from Aiahdof.sc lhaliana (ABM PPH11 Oaenorhabdttis elegans t FEM-2 F42t^9 1 T23F11 U Leishmania chaqast 
and Paramecium tetraurelia 

s in Arabidopsis thahana the Kinase associated protein pnosphatasc (KAPP i [3] is an enzyme that dcohosphorylates 
the Ser/Thr receptor-like kinase RLKa and which contains a C-termmal PP2C domain, 

PP2C dor--s not suf-rn to !>=• evolutional related to th#- main ijmtly of ^erinu ' threonine phosphatases FP1 PP2A and 
PP28 However it is significant!) similar to the catalytic subunit of pyruvate denydrogenase pnosphatase< EC J 1 o 43 1 
(PDPC) [4], which catalyzes dephosphoryiation and concomitant reactivation of the alpha subunit of the E1 component 
w of irn» pytuvale dehydtog^n^se com} le* PDPu is= a nukohondtial enz>ine arid like PF2>" magrn^tumdeL^ndenl 
As a itgnatuifc pattern the bust eon^txed region was. sulf-ehid vi hich th lucak-d in the N-termtnal part and contains a 
pertedlv con^rved tiipeptide This region includes a conserved aspartate lesidue invoked tn dt\ a lent cation Lindtn^ 
[5]. 

[1159] Consensus pattern [LIVMFVHL!VMFVA3-[G&AC]-[L!\'M3-[FYC3-D-G-H-iGAV] 

i5 

Note: PP2C belongs [6] to a superfamily which also includes bacteria! proteins such as Bacillus spollE, rsbU and 
ii.bV\ Svnechocyi.tis FCC 6803 wfO a* well at, d dot ruin in funyjl .tdenyl.jte >.y;l.ts<rs 

[ 1| Wenk J.. Trompeter H,-l., Pettrich K.-G.. Cohen P.T.W., Campbell D.G.. Mieskes G. FEBS Lett. 297:135-138 
20 (1992). 

[2]MaedaT TsaiA>'M Saito H Mol Cell Biol n 5408- c >4 17(1993' 
[ Jj Stone J M Collmge M A Smith R D Horn M A Vvaii.er J C Science 206 7t*3-705t 1W4> 
[ 4] Lawson J.E.. Niu X.-D., Browning K.S., Trong H.L., Yan J,. Reed L.J, Biochemistry 32:8987-8993(1993). 
[ 5) Das A K Helps N R Cohen PTW Bjrford D EM BO J 24 o73o-6*09 t 109C) 
25 [ej Bork P., Brown N.P., Hegyi H.. Schufe J. Protein Set. 5:1421-1425(1996). 

[1160] 419 (PPTA) Protein prenyrttan^tases alpha suburu! repeat signature- 

Ptotem pr> j nyltt msferases eataUzu the transit uf m isoptenyl momty to a cysteine four it-stdiit**. from the- C-ieiminus 
ofse^eial proteins Tney are heterodimetic enzymes consisting of alpha and beta subunits The alpha suountt is thought 
30 k participate in a stable comply with the i^oprenyl subsiratf 1 the bcAa ^ufcunit binds the \ ei-tidf* subsiratf 1 Distinct 
protein p tenyltransferases might shaie <j common alpha subunit Both the alpha and beta subunit show repetitive 
sequence motifs {13- These repeats have distinct structural and functional implications and are unrelated to each other. 
Known protein preny (transferase alpha subunits are: 

35 - Mammal t m prutf-in fatnesyitt msferase alpha subunit 

Yeast protein RAM2. a protein farnesyltransferase alpha subunit 
t'easl protein BET4 a protein geranyigeianyhransfapise : ilph : i subunil 

The conserved domain of the alpha subunit consists of about J4 ammo acidb and is repeated fixe times It contains 
40 an imaiiant tryptophan possibly m\ol\ed in heterodimenzation v,ith the con^rved phenylalanines in the lep^ated 
domains of the beta subunits. via hydrophobic bonds. T he siqnature pattern for this domain is centered on the invariant 
tryptophan. 

[1161] Concensus partem [PSiAvj^-jNDFVi-[NfK)[V|A-|L!VMAGPj-VV-[NOb"iHl v if : YHQKLIVMPj 
[1162] ( t] Bog us K i M Murra\ A W P,« er s S New Biol 4 408 4 1 1 < 1 992 } 

45 [1163] 420 tFPS'M Pruk-in phosph it ise 2 A regulatory subunit PR"5 sign itures 

Protein phosphatase 2A (PP2A) is a serine/threonine phosphatase involved in many aspects of cellular function in- 
cluding the regulation ot metabolic enzymes and proteins involved m signal transduction. PP2A is a tntnenc enzyme 
that consisls of : i com composed ot a catalylic sutunil associak'd VviHi a f^5 Kd legulakiy subunil (PR^S) : ilso called 
subunit A this complex then associates \v'ith a third variable subunit t subunit 8> which confers distinct prooerties to 

so the holoenzyme [13. One ot the forms ot the variable subunit is a 55 Kd protein (PR55) which is highly conserved in 
mammals ■ wheie thiee isotoimi are hiov\ntoe>ust ■ Ok st phtla and ye^st (^ene CDC6J-.) 'inn subunit <. ould p<rrfon n 
a substrut^ rej^gnitton fun:tion or be tesponsible ftv taigehng the en; i ,me :omplev to the appropriate subcellular 
compartment. 

as signature patterns ty,o peiteotl> consen/ed sequences ot 16 residues weie selected one located tn the N-terminal 
ss region, the other in the center of the protein. 

[1164] Consensus pattern E~F-D-r-L-K-S-L-E-l-E-E~K-l-N 

Consensus pattern: N-[AG}-H-[TA}-Y-H-l-N-S-i-S-[LIVM}-N-S-D 

[1165] [ I] Mayei- laekel R H« turnings B A Tre nd<. uell Biol 4 287-29 1(1 9^4 1 
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[1166] 42 i N-fS'DhObphonbosyljanthranilate (PPA; isomerase 

[1] Wilmanns Wf, Priestfe JP; Niermann T, Jansonius JM: 

J Moi Biol 1992.223:477-507. 

[1167] 422. fPRK) Phosphonbulokinase signature 

^ Phospnonbulohinase t EC 2 ~ 1 19' iPPKj [1 2) is one of the enzymes specific to the Calvin's reductive pentose phos 
phate ovele which is the maioi route bv which o.tthon dioxide ii as* unlisted and teduced bv autotrophic oigjnisms 
PRH catalyzes the ATP-depend^nt phosphorylation of nbuloi^ "-phosphat- 1 into rihulose- 1 5-bis phosphate- which is 
the substrate for RubisCO PRY 's of diverse origins show different properties witn respeet to the size of the protein 
the subunit structure, or the enzymatic regulation. However an alignment otthe sequences ot PRK from plants, algae. 

10 f he tosvntruMic and chtimoautoho^ hie bacterid shows that ther^ are a ft a regie ns. of i^qu^nce similarity As, a '.ignatute 
pattern one of these regions was selected. 

[1t68] ConstnbUb j.attftn lv[LIVM3-v^-0-<t3>-R-G-v-[bT)^-E 



[1] hossmann J Mintxvorth R Bow ten B t>ne 85 24 -252(1989) 

[2j Gibson J.L. Chen J.-H.. Tower P. A.. Tabita F.R. Biochemistry 29:8085-8093(1990). 

[1169] 41 3 (PPPP svnti Phosphonbco't I pyrophosph .tte synthetase iio,n.jtu!e 

PhospnortDOsyl pyiophosDhateb\ntnet3se (EC 2 7c 1 ) iPRPP synthetase) catalyzes the fotrnation of PRPP from ATP 
and nbose 5-phosphate PRPP is then used invanous biosynthetic pathways as tor example in the formationof punnes 
pyrimidmes histidme and tiyftophan PRPP synthetase inquires inoig : mic phosphate and magnesium ions for its 
stability and activity. 

In mammals three isozymes of PRPP syntnetase are found in yeast there ate at least four isozymes 
As a sinn=rtuie pattern for this ^rnym^ a vfty conserved legion Mat, seated that hat, b^en su^gfsted k b*> involved 
in binding divale-nt c ittons [1 j This te-gion contains two conserved asp irlic aud residues a^ o'tif! as a (uridine v\hich 
are all potential ligands for a cation such as magnesium. 

[1170] Consensus pattern D [l.ij H JSAJ-a-Q pMcST j-[OMj-G-(F C] S-"-\ t 2i-P [LIVMI v Cj-D 

[1171] j" 1] Bewc-i b O Hallow K W, -?wit::ei R 1. Hovsr-n-Jense-n & J Biol Chsr-m 2b4 1 028 _ -l 029 !< t«89\ 

[1172] 424 (PRTPt Hetp.^virus piofus^mg and transport protein 

Tne members of this family aie associate with capbid intermediates duting packaging of the \itus 
Number ot members: 31 
[1] 

Medline: 98362148 

Herp<^ simple* vims typp I deTv^e and p^kaymg piot^ins 
UI. 15 jnd Ui.; 8 arc- .jssoaated with b hut not C >.apsids dunng 
packaging, yu D. Weller SK: 

J Virol 1998:72:7428-7439. 
[1173] 42S Photo^ystem I p<.aG I \ ^aK (PS! P-W) (.tut tins, signature 

Photosystem I (PS! > [1] is an integral membrane protein complex that uses light energy to mediate electron transfer 
ft cm piastocyantn to feiredovin itib found in the chloroclasts of plants and cyanobactena Pel is composed of at least 
14 different subunrte Ko ot v,hi_ii PS1-G (gfne psaGi and PSI-ktyenf pbal\> aie atrial! hydiophoLic piotetn^ of about 
i tc 9 Kd and e \'clution : ny related [z] Both seem to contain two fransmemfciane legions Cvanobacfena ^eem to 
encode only for PSI-K. 

[1174] As a signature pattern the best-consewt-d legion was selected which seems to correspond to the second 
transmembrane region. 

- Consensus pattern: [GTp-x-[LlVM3-x-[DEA]-x(2HGA3-x-[GTAHSA3-x-G-H->c-[LIVM3-[GA] 
[1] Golb^ck J H Biochiin Bicphys Acta 395 1^7-204 t 1Q87) 

[2j Kjaerulff 3 Andersen B Nielsen VS Moller B L Okheis J S J Biol Chem 2G8 13512-18^10(1593' 



[1175] 416 PTR2 family pioton'oiigoc eptide svmporters signatuiei 

A family of eukaryotic and pri4,jt i ( oti; ptoteins that seem to be mainlv in^oU'^d in the intake <'f ^mall peptides with the 
concomitant uptake of a proton has been recently characterised [1 .2). Proteins that belong to this family are: - Fungal 
peptide transporter P'l R2. 

Mammalian intestine proton-dependent oligopeptide ttansportei PeptT t 
Mammalian kidney pioton-d<rpendent oligopeptide tRn^portfi PeptT/ 
Droaophiia optl. 
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Arabidopsis thaliana peptide transporters PTR2-A and PTR2-B (also known as the histidine transporting protein 
EMTR1). 

Arabidopsis. thaliana proton-dependent nittate/chlorate ttsnsportet CHL1. 

Lactococcus proton-dependent di- and tri-peptide transporter dtpT. 
s . Caenorhabditis elegans hypothetical protein C06G8 2 

Caenorhabditis elegans hypothetical protein F56F4.5. 

Caenorhabditis eiegans hypothetical protein K04E7 2 

Escherichia co!i hypothetical protein ybgH. 

Escherichia coii hypothetical protein ydgR. 
to - Escherichia coii hypothetical protein yhiP. 

Escherichia coii hypothetical protein yjdL 

Baciilus subttlis hypothetical protein yclF, 

These integral membrane proteins are predicted to comprise twelve transmembrane regions As signature patterns. 
*s two ofthe best conserved regions were selected. The first is a region that includes the end of the second transmembrane 
region, a cytoplasmic ioop as well as the third transmembrane region. The second pattern corresponds to the core of 
the fifth transmembrane region. 

- Consensus pattern: [GA3^GAS3-[LiVMFYWA3-[LIVMHGAS]"D~x-[L!VMFYMTHLIVMFYW]"G-x(3)"[TAVHiV3-x{3)~ 
20 [GSTAV3-x-[LIVMF]-x(3)-[GA] 

- Consensus pattern: (FYT3-x(2HLMFYHFYV]-[LiVMFYWA3-x-[!VG]-N-[L!VMAG3-G-(GSAHL!MF] 

[ 1] Paulsen I.T., Skurray R.A. Trends Biochem. Set. 19:404-404(1994). 
[2] Steiner H.-Y. Naider F, Becker J. M. Mol. Microbiol. 18:825-834(1995). 

[1176] 427. Pumtlio-famiiy RNA binding domains ;aka PUM-HD. Pumiiio homology domain) 
Puf domains ate necessary and sufficient for sequence specific 

RNA binding in fly Pumiiio and worm FBF-1 and FBF-2. Both proteins function as iranslational repressors in early 
embryonic development by binding sequences in the 3' UTR of target mRNAs (e.g. the nanos response element (NRE) 
30 m fly Hunchback tnRNA, or the point mutation element (PME) in worm fetn-3 rnRNA). Other proteins that contain Puf 
domains are also plausible RNA binding proteins. JSN1_YEAST. for instance, appears to also contain a single RRM 
domain by HMM analysis. 

Puf domains usually occur as a tandem tepeat of 8 domains 

The Pfam model does not necessarily recognize ail S domains in all sequences, some sequences appear to have 5 
35 or 6 domains on initial analysis, but further analysis suggests the presence of additional divergent domains. 
[1177] (1) Zhang 8, Gallegos M, Puoti A, Durkin £. Fields S, Kimble J, 

Wickens MP Nature 1997,390 477-484 [2] Zarnoie PD. Williamson JR. Lehmann R. RNA 1997,3:1421-1433. 
[1178] 428 PWWP domain The PWWP domain is named after 3 conserved Pro-Trp-Trp-Pro motif. The function of 
the domain is currently unknown Number of members' 19 
40 [1179] p] Medline 98282232 WHSC 1 . a 90 kb SET domain-containing gene, expressed in early development and 
homologous to a Drosophita dysrnorphy gene maps in the Wolf-Hirschhorn syndrome critical region and is fused to 
IgH in t{4;14) multiple myeloma. Stec !. Wright TJ, van Ommen GJB, de Boer PAJ, van Haertngen A, Moorman AFM, 
Altherr MR, den Dunnen JT; Hum Mo! Genet 1998:7:1071-1082. 
[1180] 429. PX domain 

<fs Eukaryotic domain of unknown function present in phox proteins, PLD isoforrns, a P!3K isoform. 
Number of members: 71 
£1] 

Medline: 97084820 

Novel domains in NADPH oxidase subumts. sorting nexins, and 
so Ptdlns 3-kmases: binding partners of SH3 domains? 
Ponting CP: 
Protein Sci 1996:5:2353-2357. 
[1181] 430. ParA family ATPase 
[1J 

ss Medline: 91141297 

A family of ATPases involved in active partitioning of diverse bacteria! piasmids. 
Motallebi-Veshareh M. Rouch DA, Thomas CM: 
Mol Microbiol 1990:4:1455-1483. 
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Number of members: 122 

[1182] 4C1 iFano coati Faiv*. viiut, <.o=rt piot^m 72 mtmbeis 
[1183] 432 Pecttnesterase signatures 

Pecttnestetase (EC 3 1 1 1 f j ffectm mHthy!^steiaseHataly;:es the hydtolvsis of pet tin into pertate ana methanol !n 
s plants it clays an important role tn eel! wait metabolism during fruit ripening In plant bacterial pathogens such as 
Eirwinia caah ■.<. ta ana in fungal pathoaens suoh a? Aipeigillusmger pectinepteta::.e is involved m maceiation and 
soff-rottinq of plant tissue. 

Prokaryotic and eukar\otic pectinesterases share a fev> regions ot sequence similarity (1 2 3] two of these regions 
were selected as signature patterns. 
to The fust is based cn a reojon in ihe N-Unnmal suction of these t-n^ymts ti contains a coriseiv^d tyrosine which may 
plav a rolt in the f &t<\\ tic mechanism [i] Tht se eund pattern cot re*, pond s, to the t> j st ^ons<*t\,ed legion ;<n o^tap^ptide 
located in the central part of these enzvmes, 

- Consensus pattwn (G3TNP]-><(6}-[FVVHP1-[!VN]-[KEP1-><-G-[STIVKRO]-Y-[DNOKRMVH£P3-m?)-ELIMv'^] 
*5 - Consensus pattern: JiVJ-x-G-JSTADHLiVTj-D-EFYiHiVHFSNJ-G 

[ 1] J, Knap p J Gnetson D Biro C ochtkhWEui I Ym ofem 174 124(1 988 1 

[,j Piastow G c Moi MiciodioI , ,47-2o4( 1083j 

[ 3] Markovs O Joernval! H Protein ou I 12db- i?9z{ W/i 

[1184] 433 Pentapeptide ler.eahs (8 copies) 

Tnese repeats are found in many cyanobactertai proteins 

The repeats wett first identified in hy!K [1] The function of thfSf te^afe is unknown 
The strut hire of this, r-^at has he^n predict* d In he a be ta-fielu [2] 
25 The repeat can be approximately described as A(D/N)LXX. where X can be any amino acid Number of members: 75 
[1] 

Medline: 96062225 

The hylh gsjnu is requited for localisation of h> J terocysf-spufifie glyooiipids in the eyanobaerenum 
Anabaena sp. strain PCC 712Q. 
30 Black K. Buikema WJ. Haselkorn R: 
J Bacfenol 1W P7 6440-0448 
[2]Med!ine: 98318059 
Structure and distribution ot pentapeptide repeats in bacteria. 
BatPirwnA Mm:ni A 'lYichmann SA 
35 Profe-m 5Vi 1098 14^-1480 

[3]SVledltne: 98316713 

Chaf : ioferib : iticn of an ^tabidopsis oDNA encoding a thyiakoid lumen protein related to a novel 'pentapeptide repeat' 
family of proteins. 
Kiebelbach T Mant A Robinson C cchrooei WP 
40 FEBS L*>tt I 428 241-244 

[1185] 4 34 Poly ( &\. tide defoi my lase- 
[1] 

Medline: 97002011 

A new iiirxlasi <.fthe zinc uvt.jilopioteases supei family revealed by the solution structure of peptide deformylase. 
4S M^inn^l T Bianqtmt S Datdel F 
J Mo! Biol l«3c 262 375-381. 
[2]Medline: 98332750 
Solution structure of mokel-pt;} tide defcimylase 
Dardel F Fagusa S La-ennec C BlanquetS Meinnel T; 
50 J Mol Biol 1998 2&0 30! -5 13 

Number of members: "^1 

[1186] 43<i Peptidyl-tPNAhydiolase signatures 

Peptidyl-tFNA hydrolase t £C 3 11 20^ (PTH; is a bacterial enzyme that cleaves pepttaykRNA or N-acyi-ammoacyl- 
tRNA to yield free peptides or N-acyl-amtno acids and tRNA. The natural substrate for this enzyme may be pepttdvi- 
ss tRNA 'Vhirh drop off th-» nbosome during protein synthesis [1 2] Bacte-n i! PTH h is be-.=ri found [2 3] to be- evolutionary 
related to yeast hypothetical protein yHR189w. 

PTH and yHPIS'M art- proteins, of about "00 amino aud rf?idue« A •> nn-n^tuie patterns tvo ^onserv^d i^uiont. v.eie 
stilf»cttiCt that each ocntain an histidine The fusl of ttie-,e teqioris is located in tht N-leiminal setfion ltn : ' other in ttie 
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centra! part. 



- Consensus pattern: [FY]-x(2}-T-R-H-N-x-G-x(2)-[LiVMFA3(2)-[DE3 

- Consensus pattern; [GS]~x(3)-H-N-G~[UVMHKRHDNSHLIVMT] 

[ 13 Garcia-v'illegas M.R . De La Vega F.M . Galindo J.M., Segura rVL Buckingham R.H., Guameros G. EMBO J. 
10:3549-3555(1991). 

[2] De La Vega F.M.. Galindo J.M., Old i.G., Guameros G. Gene 169:97-100(1996}. 
[ 3] Ouzounis C . Bork P., Casari G., Sander O. Protein Soi 4:2424-2428(1995) 

[1187] 436 (Peptidase M17) Cytoso! aminopeptidase signature 

Cytosol amtnopeptidase is a eukaryotic cytosolic zinc-dependent exopeptidase that catalyzes the removal of unsub- 
stituted amino-acid residues from the N-terminus of proteins. This enzyme is often known as leucine aminopeptidase 
{EC 3.4.11.1} (LAP) but has been shown [1] to be identical with prolyl aminopeptfdase {EC 3.4.11.5}, Cytosol ami- 
nopeptidase is a hexamer of identical chains, each of which binds two zinc ions 

Cytosol aminopeptidase is highly similar to Escherichia coli pepA, a manganese dependent amtnopeptidase. Residues 
involved in zinc ion-binding [23 in the mammalian enzyme are absolutely conserved in pepA where they presumably 
bind manganese, 

A cytosol aminopeptidase from Rickettsia prowazekn f3j and one from Arabidopsis thaliana also belong to this family. 
As a signature pattern forthese enzymes, a perfectly conserved octapepttde was selected which contains two residues 
involved in binding metal ions an aspartate and a giutamate 

Consensus pattern: N-T-D-A-E-G-R-L [The D and the E are zinc/manganese ligandsj 
Note: these proteins belong to family M17 in the classification of peptidases [4 ( E1]. 

[ 13 Matsushima M., Takahashi T,, Ichinose M,, Miki K., Kurokawa K,, Takahashi K. Biochem. Biophys, Res, Com- 
mun. 178:1459-1484(1991). 

[ 23 Burley S.K., David P.R., Sweet R.M., Taylor A., Lipscomb WN. J. Mo!. Biol. 224:113-140(1992). 
[ 33 Wood D.O.. Soiomon M.J . Speed R.R J. Bacterid. 175' 1£9-165{1 993}. 
[4] Rawhngs N D . Barrett A.J Meth. Enzyme! 248 183-228(1995) 

[1188] 437 Assemblin (Peptidase family $21} 

Medline: 96399137 

Three-dimensional structure of human cytomegalovirus protease. 

Shieh HS. K'.irumbail RG Stevens AM. Stegeman RA, Sturman EJ, 

Pak JY, Wittwer AJ, Palmier MO, Wiegand RC, Holwerda BC, 

Stailings WC; 
Nature 1996:383.279-282 
Number of members: 29 

[1189] 438. Pollen proteins Die e 1 family signature 

The following plant pollen proteins, whose biological function is not yet known, are structurally related [1]; 



Olive tree pollen major allergen (Die e I). 
45 - Tomato anther-specific protein LAT52 - Maize pollen-specific protein ZmC13. These proteins are most probably 
secreted and consist of about 145 residues As shown in the following schematic representation, there are six 
cysteines which are conserved in the sequence of these proteins. They seem to be involved in disulfide bonds. 



xxxxxxCxCxxxxxxxxxC^xxxxxxxxxxxxxxxxGucx^^ 

******'€': conserved cysteine involved in a disulfide bond, 
'*': position of the pattern. 

Consensus pattern: [EQj-G-x-V-Y-C-D-T-C-R [The two C's are probably involved in disulfide bonds] 
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[1190] [t)ViiialbaM BataneroE Lopez -Otm C Sanchez L M MonsaUe R i Gonzalez De La Pena M A , Lahoz 
C Rodnguez R Ell! ! Biochem 8o3-8G<^ 1903) 
[1191] 439 Pollen allergen 

This family contains allergens lot PL Fit and Pill from Loliurn perenne. 
s Number of members: 49 
[11 

Medline: 90105394 

Complete primary structure of a Loliurn perenne (perennial rye grass) pollen allergen, Lo! p Hi: comparison with known 
Lol p I and I! sequences 
to Ansan aA. Shenbagarnutthi P. Marsh DG: 
Biochemistry 19S« 2£ e O ^5-86~0 
[1t92] 4-10 Porphobilinogen deaminase xf^toi-Linding sit*> 

Poiphobilux gen deaminase {!■! C 4 3 t &) or hyd!o>ymeth\,lbilane synthase is an enzyme involved in the biosynthesis 
of pt'iphynnsa and i elated mociotyrles It jutjlyzes the assswnbh of fom poiphobilinng.w"i iPBG} units in a nead to tail 
*s fashion to form hydroxy methylbi lane. 

The enzyme covaiently binds a dipyrromethane cofactorto which the PBG subunits are added in a stepwise fashion. 
In the Psth<rtichia ooli enzvme <yen<r hemC) this cofactu has. teen sho*\n [Ijto be bound bv the sulfur atom of a 
cysteine The region around this cysteine is conserved in porphooiiinogen deaminases ttom \anous ptokaiv'otic and 
eukarvotic sources. 



- Consensus pottwn E-R-x-[Ll\ MFA]-<{3V[LIVMF]-<-G-[GSA]-C-y-[l\ T]-P-[L1\ MFj 
-[GSA] [C is the cof actor attachment site] 

[1193] ( 1] Miller A D Hart G.I Faci-man L C Eattersby A R Eiuchem t 2M 915-91*. 1988t 
25 [1194] 441. Presenilin 

Mutations in presenilin-1 are a major cause of early onset Alzheimer's disease [2] It has been found that presemhn-1 
i^wi^s P4S7f>81 binds to beta-catemn mvwc [4] This family alsc contains- SPE prclc-in^ ticm C elsr-oans 
Number of members: 23 
[1] 

30 Medline: 98045995 

Presenilins and Alzneimer's disease. 
KimTW Tanzi RE; 

Cun Opin N^uioLiol 19^7 7 683-rS8 
f2]Med!ine- 98045995 
35 Presenilins and Alzheimer's disease. 
Kim TW Tanzi RE; 

Curr Opin Neurobiol 1997:7:583-888. 
[3]Med!ine: 98099802 
interaction of presenilins with the fflamin family of acttn-binding proteins. 
40 Zhang W Han &W Mcke* I DW Goat* 6 V.u J> 
J Neurosci 1998:18:914-922. 
[4]Medline: 99004850 

Destabilisatton of beta-catemn bv mutation*, in preseniiin-1 potentiates neuronal apoptosts 

.'hang 2" Hartin.jnn H Do vM Abr^iniowsM I!', ^tutchlei-Pien jt 
45 c StiiifenbielM Sommti B van dt Watering M C lexers H 
Saftig P De Strooper B He s Vanknei BA 

Nature 1998:395:698-702. 
[1195] 442 (Pribosylfian) Putins pyrimidint; f. hosphoribosyl ti ansfetases signature 

Phosphoribos} (transferases tPRTt are enzymes that catalyze tne synthesis of beta-n-5'-monophosphates from phos- 
50 ptKiibosylpyiophosphate iPPPPt and an fnzyinf specific amine A number <.f PRTb ate involved in the biosynthesis 
of purine pynmidine and pvndine nucleotides 1 1 in the s.jL .t-je of punnes and pvnnudines These enzymes aie 

Adenine phosphonbosyltransferase (EC 2.4.2.7) (APRT), which is involved in purine salvage. 
Hypo A anthme guanine oi hvpovanthme phosphoitbosyltiansfeiai-e i£C 2 4 2 8) (HGPPl" or HPR'f i which are 
ss involved in purine salvage. 

Orotate phospnonbosyitransferasfc (EC 2 4 2 lOi (OPRT^ which is m\ ol\ed in pyrimidine biosynthesis 
Amido phosphonbosyltransferase (EC 2.4.2.14). which is involved in purine biosynthesis. 
Xanfhine-qufinint; phosphonbosylfiansf^'ifist; (EC 2 4 2 22) i aGPRTi which is invoked in punnt; salvac-e. 
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In the sequence of ail these enzymes there is a small conserved region which may be involved in the enzymatic activity 
and/or be part of the PRPP binding site [1]. 

- Consensus pattern; [L!VMFYWCTAHLtVMHLiVMAHL!VMFCHDE]-D-[LfVMS3^LIVMHSTAVDHSTARHGAC}- 
s x-[STAR] 

Note in position 11 of the pattern most of these enzymes ha«e Gly 

[1196] [1] Hershey H V , Taylor M W Gene 43 287-293(1 986 V 
[1197] 443. (Pro CA) 

to Prokaryotic-type carbonic anhydrases signatures 

Carbonic anhyiiras.es> (EC 4 2 11; (CA) are zinc metal lo enzymes which cjtalvzo the reversible hydration of carbon 
diovide in Escherichia coll, CAigenecynTtis involved in recycling carbon dioxide formed in the bicarbonate-dependent 
decomposition of cyanate by cyanase (gene oynS). By this action it presents, the depletion of cellular bicarbonate [ t] 
In photosynthetic bacteria and plant chlotoplast CA is essentia! to inoiganic caibon fixation [2] Piokaryottc and plant 

»5 chloroplast CA are structurally and evolutionary related and form a ■family distinct from the one which groups the many 
different forms of eukaryotic CA's (see <PDOC00146>). Hypothetical proteins yadF from Escherichia coll and HI1301 
from Haemophilus influenzae also belong to this family. Two signature patterns were developed for this family of en- 
zymes Both patterns contain conserved residues that could be involved in binding zinc (cysteine and histidine! 

20 - Consensus pattern: C-[SA]-D-S-R-[LIVM]-x-[AP] 

- Consensus pattern: [£Q3-Y-A4LIVM]-x(2HL!VM]-x(4HLfVMF](3)-x-G-H-x(2)-C-G 

[ 1] Guilloton M.B.. Korte J.J., Lamblin A.F., Fuchs J. A., Anderson P.M. J. Biol. Chero. 267:3731-3734(1992). 
[2] Fukuzawa H., Suzuki E., Komukai Y. Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 89:4437-4441 (1992V 

[1198] 444. (Prolyl., oiigopep) 

Prolyl oligopeptidase family serine active site 

[1199] The prolyl oligopeptides^ family (1 2 3j consist of 3 number of evolutionary related peptidases whose catalytic- 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases but 
30 which evolved by independent convergent evolution. The known members of this family are listed below. 

Prolyl endopeptidase (EC 3.4.21 .28) (PE) (also called post-proline cleaving enzyme). P£ is an enzyme that cleaves 
peptide bonds on the C -terminal side of piolyl residues The sequence of PE has been obtained fiom a mammalian 
species (pig) and from bacteria (Fla^obaoterium meningosepticum and Aeromonas hydrophilaV there is a high 
35 degree of sequence conservation between these sequences 

Escherichia coli protease !! (EC 3.4.21.83) toligopeptidase 8) (gene prtB} which cleaves peptide bonds on the C- 
fermina! side of lysyl and argminyl residues. 

Dipeptidyl peptidase IV (EC. 3 4 14 5) (DPP IV t DPP IV is 3n enzyme th3t removes N-terminai dipeptides sequen- 
tially from polypeptides having unsubstituted N-termmi provided that the penultimate residue is proline 
40 - Yeast vacuolar dipeptidyl ammopeptidase A (DPAP 6.) (gene STE 13) which is tesponsible for the pioteolytic mat- 
uration of the alpha-factor precursor. 

Yeast vacuolar dipeptidyl ammopeptidase B (DPAP 8) (gene: DAP2), 

Acviamino-aotd-reieasmg enzyme {EC 3 4 19 1) (aryl-peptide hydrolase! This enzyme catalyzes the hydrolysis 
of the amino-teimmal peptide bond of an N-acetylated protein to generate a N-aoetvlated ammo acid and a piotein 
45 with a free ammo-terminus. 

[1200] A conserved serine residue has experimentally been shown (in t--..coli protease!! as well as in pig and bacterial 
PE) to be necessary for the catalytic mechanism This serine, which is part of the catalytic triad (See, His., Asp) is 
generally located about 150 residues away from the C-terminal extremity of these enzymes (which are ail proteins that 
so contains about 700 to 800 amino acids). 

[1201] Consensus pattern D--<(3)-A-m, 3i-|L.IVMf : YW>-<(14)-G-K-S->-G-G--EUVMFYWji:nES is the active site lesiduej 
Sequences hnown to belong to this class detected by the pattern ALL except for yeast DPAP A 



ss 
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[1202] Note these proteins belong to families SuA'StB/SCC in the classification of peptidases [4] 

[ 1] Rawiings N.D.. Polqar L. Barrett A.J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett A.J.. Rawiings N.D. 
s [3]PotgarL, Szabo £. 

f 4] Rawiings N D Bluett A I Meth E-n-vmol *44 19-61(1^4 1 

[1203] 445. (Pterin 4a) 
Pterin 4 alpha carbinolamtne dehydratase 
to [1204] Fitiiin 4 alpha carbinolamine de hvdrt3ft3s.fi is aka DOoH idimensaiion cotactof of hepatoeyte rtuclesi factor 
1-aipha). 

[1205] Number of members: 11 

[1206] (1j Crunk ID Eindttr.TtJA &lber ]' Medline 9 "0^6 7 Hiqh-tesolution stiuctuiei <. f the bifunctiona! enzyme 
and tiansirtf tn'naf coaitwatoi DCoH and its complex with <j prooutt analogy** " Ptotein Sci 1^96 5 1 96^- f 9^ 2 
*5 |120?3 446. (Pyndox oxidase) 

Pyndoxamine 5-phosphate oxidase signature 

[1208] Pyrido*amine ^ -phosphate oxidase (rr.C 1 4 3 61 it j f-'MN lipoprotein Envoi" ed in the de novo synthesis ot 
pyrido^tne tvitamm Eel ana pynaovai phosphate It ovidores p\ndovamine-5-P tPMFi and pyrtdo^tne-5-P (PNP) to 
pyridoxal-5-P. The sequences of the enzyme from bacterial (genes pdxH ortprA) [1] and fungal (gene PDX3) {2} sources 
so show that this protein has been hiqhty conserved throuqhout evolution. 

PdxH is e" oluhonary lelated [?] to one of the ^n:vm% in tne phenazine biosynthesis protein pathway phzD (also 
known as phzO) As a signatute pattern a highly conserved region was selected located in the C-termtnal part of these 



25 - Consensus pattern: [L!VP]-E-F-W-[QHG]-x(4>-R-j:L!VM]-H-[DNE]-R 
[ 1] Lam H.-M.. Winkler M.E. J. Bacterid. 174:8033-8045(1992). 

[2jLouhb;<idiA karst F Giiill.it.inM M*icir«ati C J baoienol 177 1£1~-1 823(1 995 1 
[ Pierson L e 111 Gaffne\ T LamS Gong F FEMS Microbiol Lett I34 2y0-30~t W0G) 

[1209] 447. (Pyrophosphatase) 
Inorganic pyrophosphatase signature 

[121 0] Inoigwtc puophobphatas.- (EC 3C 1 n.PPast-ifl 2] i b th<= <=nzi i^ponsible fu th*> hvdrof/sts ot pyt<.- 
phosphate (PPi) which is formed principally as the product of the many biosynthetic reactions that utilize ATP. All known 

35 Ppasr- a if quir- 1 th> j pff sr- no 1 .it du iicnt metal cations vi ith magnesium conferring the highest artiviH Among oih.*f 
lesidues a lysine has been postulated tobe part or close to the a<.twe site PPases have been sequenced fiom bacteria 
such as Escherichia -xli (humoho : imer) thermophilic bacteria Pb-3 and Thermae Huwnxphilus fiointht: ate haeh : i>> 
tena Thermoplasma acidophilum. from fungi (homodimeri. from a plant, and from bovine retina. In yeast, a mitochon- 
dria! isofoim of PPase nas been cnaracteiced which seems to be in\ olved in energy production and ^nose actt^U is 

40 stimulated bv uneouplers of ATP synthesis. 

[1211] the sequences ui FP : ises share some regie ns ui siinikMihes As signature patterns a teqion w : is selected 
that contains thn-e ^nser^'ed asparfjt^s that are invoke i m the binding of cations, 

- C tmensus pattern D-jSGDNj-D-jPE-:) jL|\ Mf->D-|UVMG*\€] 

45 

[The three D's bind divalent metal cations] 

[ 1 3 Lahti R hohikcwski L F Ji HemonenJ Vihinen M Fohjanoksa K >"oof.erman B ^ Bio.'hitn Bioohys Acta 
1038:336-345(1990). 

so [ 2] coopetman B S Bayko^AA Lahti R Ti^nds Biochein Sci 17 2o2-2Go(l tJ 9^i 

[1212] 448. (Peptidase S26) 
Signal peptidases I signatures. 

[1213] Signal peptidases (SPasest j l j taka leadei peptidases remos/e the signal peptides fiom secieroi> pioteins 
55 In ptuk'iy.it.^ thrt;.^ types of SP ises ire known typt I icprit; lepB) vshioh th \<- sp.insiblt; for the processing of th> j 
majority of exported pre-ptotetns type 11 igene isp> which only piocess licoproteinb and a third type in\ol\ed in the 
pfOOHSsmy of pill lUbumtt, SP^^e I tFG3 4 ?l c*9nsan integral memLian* 1 piotem that is anchofnd in the cytoplasmic 
membrane bv one (in B. subtilts) or two (in E. coll) N-terminal transmembrane domains with the mam part of the protein 
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protecting in the periplasms space Two residues have been shown [2,3] to be essential for the catalytic activity of 
SPase !' a serine and an lysine SPase I is evolutionary related to the yeast mitochondrial inner membrane protease 
suburut 1 and 2 (genes IMP'! and 1MP2) which catalyze the removal of signal peptides required for the targeting of 
proteins from the mitochondrial matrix, across the inner membrane, into the inter-membrane space [4] In eukaryotes 

s the removal of signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits- the 
signal peptidase complex (SPC). The SPC is located in the endoplasmic reticulum membrane. 'Iwo components of 
mammalian SPC, the 18 Kd (SPC18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subumt have been 
shown [5] to share regions of sequence similarity with prokaryotic SPases I and yeast IMP1/IMP2 Three signature 
patterns have been developed for these proteins. The first signature contains the putative active site serine, the second 

10 signature contains the putative active site lysine which is not conserved in the SPC subunits. and the third signature 
ooiresponds to a conserved region of unknown biological significant- 1 <vhirh is located in the C-termina! section of all 
these proteins. 

[1214] Consensus p.jttem [GS}->-S rVU-[PS] [Al] [W}[S is an a.tKe itte lesidue]- 
Consensus pattern K-R-[LIVMST-i]f2)-G-.-[PG}-G-[D£3-y-[LIVM]^-[LI\ MFY] [K is an active site residue]- 
*5 Consensus pattern: [LIVMFYW3(2>-x(2}-G-D-ENH3-x(3HSND3-x(2HSG]- 

[1215] piDalbeyRE son Hetjne G Trends Biochem Sci 1 7 4 v 4-4'fd{ l Q D2t [ 2j Sung Wt Dalbey R E iBiol 
Chen, 2T t 31M- 1 i t 1992! { 3| Blac V M 7 J B.jctenol 1 W, -49o 1(19^1 1 4j Nunnan J f : o> 1 D v\'j|ttt F 
Science 262 1 90"-2o04(; 1 193 1 [ 5] «n Dtji J M de Jong A Vehmaanpera J Venema G Eron S EMBO J it 
281f-3828(1f93) f Pawiings N D , Banett AJ Meth En.t\mol 244 1 P -61(1 994) j£7.1] 

[1216] 449 (Pef tidaseCl ) Eukaryol ic thiol </ ysfetne i ptc teases active stt<^ Eukaiyotic thiol ( releases (EC 3 4 22 -) 
[1] -jte a family of proteolytic enzymes which cont.un an active site cysteine Cutjlysis pioceeds through a thioestei 
intermediate and is facilitated by a nearb\ histtdine side chain an asparagine completes the essential catalytic tiiad 
The proteases which are cunently knov.n to bekng to this family aie listed below ijefeiences are only provided for 
lecentiv determined sequences') - Ve-rt^brate lysosomal cathepsms B (EC 3 4 22 1t H (EC 22 16) L (EC 

25 3.4.22.15). and 3 (EC 3,4,22.27) [2], - Vertebrate lysosomal dipeptidy! peptidase I (EC 3.4.14,1) (also known as cathe- 
psm Ci |2j ■ Vertebiate calpains (EC 0* 4 22 1 7\ Calpains aie mtiacellulai calcium- activated thiol protease that contain 
both a N-tenntnal cala!>tic domain and a C-teimmal calcium-binding domain - Mammalian oalhepsm K which sesr-ms 
irKolvudin osteoclastic bone lesorption [3] - Human cathepsin O [4 j - Bleomycin hydrolase 4.n tn^yme that catalyzes 
the mactivation of the antitumor drug BLM (a glycopepfidei - Plant enzymes: barley aleuram (EC3.4. 22.16), EP-B1/B4; 

JO kidney bean EP-C1 rue bean ^H-EP kiwi fruit at tin id m (EC 3 4 22 14) papaya late t papain (EC 3 4 22 2) chymo- 
papain (EC 3 4 2205 contain (EC 3 4 21 30) and proteinase IV (EC 3 4 22 2S) pea ttirgor-rHsr. onsive pmtHn 15A 
pineapple stem bromelain (EC 3.4.22.32): rape COT44; nee oryzain alpha, beta, and gamma: tomato low-temperature 
induced Arabidtpsisthaliana A494 RC t9A and PD21A - House-dust mites allergens OerP t and EuiM t -Cathepsin 
B-like piotein.jsesfn.rn the woims Caenoih.ibditiseleg.tns: (genei. gcp-1 >.pt-3 cpr 4 cpi-6 and >.pK-.) Schistosoma 
mansoni taniigen SM31 \ and J ipontc i (antigen S 131 j Ha> j monchus contortus (y^nus A3-1 and AC-2t md Ostertagia 
ostertagi (CP-1 and CP-3i ■ Slime mold cysteine proteinases C Pi and CP2 ■ C ni?ipam from 'Irypanosoma ciu7i and 
brueei - Trophozoite cysteine pioteinase (TCP1 from vanous Plasmodium speues - Proteases from Leishmania 
m-^ioana Theilena annulata and Theilena parv3 - Bactilovnuses catnepsin-lil-f- enzs'ine (v-cath) - Dtosophila small 
optic looes piotein (gene son a neuional protein that contains a calpain-itke domain - Yeast thiol protease 

40 BLH1/yuP1/LAP3. - Caenorhabdrtis elegans hypothetical protein C06G4.2. a calpam4ike protein. Two bacterial pepti- 
dases are also f.ari of this family - AmtnoL^ptidase C hem Lactococ^us lactis (gene p^pCi [hj - Ihiol ( rck :i ase tf.i 
from Porphyromonas gingivahs. Three other proteins are structurally related to this family, but may have lost their 
proteolytic activity ■ Soybean oil body piotem P34 '! his protein has its active site cysteine replaced b\ a glycine ■ Rat 
testin a Sertoli cell seoietory putem highly similar to cathepsin 1. but *rth the a>.tKe iite cysteine is tepiaced bv a 

■>s i me- Rat te-sttn should not confused vstth mouse- t^stin vshioh is i LIM-doinam protttn (s^e -£p.OC00382>). - 
Plasmodiumfaicipariimserme~iepeatpiotein(SEPA> the major blood stage antigen This protein of 11 1 Kd possesses 
a C-teimmal thiol protease like domain [6] but the active site osteine is replaced bv a seime. The sequences around 
the three a:tivn site lesidues ute well conserved unci can oe used as signatuie patterns. 

[1217] Consensus pattern 0-\iot-EG£]-v-C-EYW]-y(2H&TAGCH3TAGCVj [C is the active site residue]- Note- the 
■io tesfdue in cosition 4 <.f the pattern is almost always cysteine the only evv.eptn.ns aie calpains (Leu), bleomycin hy- 
drolase !~?er) and yeast ^ CP1 <Sei) -hiote the tesidue m position 5 of the pattern is always Gly except in papaya 
protease IV where it is Glu, 

[12183 Consensus pattern: [LiVMGSTAN)-x-H-EGSACEHL!VM]-x-{LIVMATJ(2)-G-x-{GSADNH] [H is the active site 
residuej- 

ss Consensus pattern: EFYCHHVVlHliVTJ-x-EKRQAGj-N-^ 

is the active site lesiduej - Note these proteins belong to famiK C1 (papam-t^pejand C2 ^calpains) in the classification 
of peptidases [7.E1].- 

[1219] [ 1] Dutoui E Biochttnte ~i) 1 33S- H4z t 1Q83) [ 2] hirschke H Bairett A I Rawlings N.D. Protein Prof. 2: 
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1587-1643(1995) [ 3) Shi G-P., Chapman H.A.. Bhatri S M . Deteeuw C, Redely V Y. t Weiss S J. FE8S Lett 357' 
129-13-1(1995) [4] Velasco G. Fetrando A A. Puente X.S . Sanchez L.M.. Lopez-Otm C J. Biol Chem. 269. 
271 36-271 42i 1994) [ 5] Chapot-Chartier M P.. rJardi M . Chopin M C , Chopin A , Gripon J C. Appi Environ. Microbiol 
59 330-333(1993} [ 6] Higgins D.G.. McConnell D.J., Sharp PM. Nature 340 604-604(1 989>.[ 7] Rawlings U.D.. Barrett 
s AJ. Meth. Enzymol. 244:461-488(1994). 

[1220] 460 (peptidase M24) Aminopeptidase P and proline dipeptidase signature (1) 

Aminopeptidase P (EC 3.4.11_.S> is the enzyme responsible for the release: of any N-iermina! amino acid adjacent to a 
proline residue Proline ctipeptidaset EC 3 A I 13.9) (prolidase) splits dipeptldes with a prolyl residue in the carboxyl 
termmai position. Bacterial aminopeptidase P II (genepepP)[l], proline dipeptidase (gene pepQp], and human proline 
10 dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metailoenzyrnes Yeast hypo- 
thetical proteins YER07Sc and YFR006w and Mycobacterium tuberculosis hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattern for these enzymes a conserved region that contains three histidme residues 
has been developed 

[1221] Consensus pattern EHAHG3YRj-[LIVMT3-[SG3-H-x-(LIV]-G-[LIVM3-x-(!Vj-H-[DE]- 
*5 [12223 1 1 1 Yoshimoto T„ Tone H., Honda T., Osatomi K„ Kobayashi R., Tsuru D. J. Biochem. 105:412-418(1989). 
[2] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:6439-6439(1990).[ 3] Endo F„ Tanoue A., Nakai H, Hata A., 
Indo Y„ Titani K , Matsuda I. J. Biol. Chem. 264:4476-4481{1989).[ 4j Rawlings N.D, Barrett A.J. Meth. Enzymol. 248: 
183-228(1995). 

[1223] Methionine aminopeptidase signatures (2) Methionine aminopeptidase (EC 3 4 11 18) (MAP) is responsible 
so for the removal of the amino-terminal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokary- 
otic proteins if the penultimate ammo acid is small and uncharged. All MAP studied to date are monorneric proteins 
that require cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [1 ,2]. While being evolutionary 
related, they only share a limited amount of sequence similarity mostly clustered around the residues shown, in the 
Escherichia coll MAP [3], to be involved in coball-binding. The first family consists of enzymes from prokaryoies as well 
25 as eukaryoticMAP-1, while the second group is made up of archebacterial MAP and eukaryoticMAP-2. The second 
subfamily also includes proteins which do not seem to be MAR but that are clearly evolutionary related such as mouse 
proliferation-associated protein 1 and fission yeast curved DNA-hmding protein. For each of these subfamilies, a spe- 
cific signature pattern that includes residues known to be involved in eolbalt-binding has been developed 
[1224] Consensus pattern. [MFY3^-G-H-G-[LIVMC]-[GSH]-x(3)-H-x(4)-[LIVM]-x-j;HN]-[YVW3 [H is a cobalt iigand]- 
30 Consensus pattern: [DAHL!VMY3-x-K-[LiVM]-D-x-G-x-lHQj-[LiVM3-[DNS3-G-x(3HDN] [The second D and the last Di 
N are cobalt ligands] 

[12253 f 1] Arfin S.M., Kendal! R.L. Hal! L, Weaver L.H., Stewart A.E., Matthews B.W., Bradshaw R.A. Proc. Natl. 
Acad. Sci. U.S.A. 92:7714-7718(1995).[ 2] Keeling P.J., Doofittle W.F. Trends Biochem. Sci. 21:285-286(1998).[ 3] 
Roderick S.L, Mathews B.W Biochemistry 32:3907-3912(1993) [ 4] Rawlings N.D,, Barrett A.J, Meth. Enzymol, 248: 
35 183-228(1995). 

[1226] 451. Cytochrome P4S0 cysteine heme-iron Itgand signature 

Cytochrome P45Q's [1.2.3. E1J are a group of enzymes involved in the oxidative metabolism of a high number of natural 
compounds (such as steroids, fatty acids, prostaglandins, leukotrienes, etc) as well as drugs, carcinogens and muta- 
gens. Based on sequence similarities. P4SQ's have been classified into about forty different families [4.5], P450's are 
40 proteins of 400 to 530 amino acids; the only exception is Bacillus BM-3 (CYP102) which is a protein of 1Q48residues 
that contains a M-terminai P450 domain followed by a reductase domain P450's are heme proteins A conserved 
cysteine residue in the C-terminal part of P450's is involved in binding the heme iron in the fifth coordination site. From 
a region around this residue, a ten residue signature was developed specific to P450's. 
[1227] Consensus pattern: [F-W}-jSGhiH3-x-[GD[-x-jRHPT j-x-C-[LIVMF-APHGAD] [C is the heme iron ligand]- 

[ 1] Nebert D.W, Gonzalez F.J, Annu. Rev. Biochem. 56:945-993(1987). 

[2] Coon M,J . Ding X.. Pernecky S.J.. Vasr A.D.N FASEB J. 6 669-673(1992). 

[ 33 Guengench F P J Biol. Chem. 266:10019-10022(1991) 

[4] Nelson D.R.. Kamataki T., Waxman D.J.. Guengerlch P.P., Estrabrook R.W., Feyereisen R., Gonzalez F.J,, 
so Coon M.J., Gunsalus I.C.. Gotoh O, Okuda K„ Nebert D.W. DNA Cell Biol. 12:1-51(1993). 

[ 5] Degtyarenko K.N., Archakov A.I. f-'EBS Lett. 332" 1 -8(1 993) 

[12283 4 ^2. (Pec Lyase) Pectate lyase 

This enzyme forms a right handed beta helix sfrucfuie Pectate lyase is an enzyme involved in the maceration and soft 
55 rotting of plant tissue. 

[1229] [1] YoderMD, Keen NT. Jurnak F. Science 1993:260' 1503-1 507 

[1230] 453. (pep M24 ! Aminopeptidase P and proline dipeptidase signature (pepl ) 

Aminopeptidase P (EC 3 4 11 9) is the enzyme responsible for the release of any N-terrninal amino acid adjacent to a 
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proline residue. Proline dipeptidasetEC 3 .4.13.9) (prolidase} splits dipeptides with a prolyl residue in the carboxyl 
terminal position. Bacterial arninopeptidase P II (genepepP)[t] ; proline dipeptidase (gene pepQp). and human proline 
dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metailoenzyrnes Yeast hypo- 
thetical proteins YER073c and YFROOSw and Mycobacterium tuberculosis .hypothetical protein MtCY49.29c. also be- 
s long to this family As a signature pattern for these enzymes a conserved region was selected that contains three 
histidine residues. 

[1231] Consensus pattern EHAHGSYR3-[LtVMT3-[SG3-H-x-[L!V]-G-[LtVMj-x-[!V3-H-[DE]- 

[ 1) Yoshimoto T, Tone H , Honda 1. Osatomr K... Kobayashi R.. Tstmt D. J Biochem 10S:412-416( 1989) 
10 [ 2] Nakahigashi K . Inokuehi H. Nucleic Acids Res. IS 8439-6439(1990). 

[ 3] Endo F : Tanoue A : Nakai H . Hata A., Indo Y., Titani K.. Matsuda 1 J Biol. Chem 264:4476-4481 i 1989) 
[4J Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1232] Methionine arninopeptidase signatures (pep2) 

*s Methionine arninopeptidase (EC 3.4.1118) (MAP) is responsible for the removal of the amine-terminal (initiator) me- 
thionine from nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino acid is small 
and uncharged. All MAP studied to date are monomelic proteins that require cobalt ions for activity Two subfamilies 
of MAP enzymes are known to exist [1,2]. While being evolutionary related, they only share a limited amount of se- 
quence similarity mostly clustered around the residues shown, in the Escherichia coli MAP [3],to be involved in cobalt- 

so binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1, while the second group 
is made up of archebactenal MAP and eukaryotic MAP -2 The second subfamily also includes proteins which do not 
seem to be MAP. but that are clearly evolutionary related such as mouse proliferation-associated protein 1 and fission 
yeast curved DNA-binding protein. For each of these subfamilies, a specific signature pattern was developed that 
includes residues known to be involved in colbalt-binding. 

25 [12333 Consensus pattern: [MFY3-x-G-H-G-ELIVMCHGSH]-x(3)-H-x(4HLIVM]-x-[HNHVWV] [H is a cobalt ligand}- 
Consensus pattern: [DAHLlVMY]-x-K-[LIVM]-D-x-G-X"[HQ]"[LIVM]-[DNS3"G-x(3)"[DN] [The second D and the last D/ 
N are cobalt ligands] 

[ 1] Arfin S.M., Kendall R.L., Hall L. . Weaver L.H., Stewart A.E.. Matthews B.W., Bradshaw R.A. Proc. Natl. Acad. 
30 Sci. U.S.A. 92:7714-7718(1995). 

[ 2] Keeling P J., Doolittle W.F Trends Biochem Sci. 21 28?-286(1996) 
[ 3] Roderick S.L., Mathews 8.W. Biochemistry 32:3907-3912(1993}. 
[4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

35 [1234] 454. Peroxidases signatures 

Peroxidases (EC 1.11 1 -j [1] are heme-brnding enzymes that carry out a variety of brosynthetrc and degradative func- 
tions using hydrogen peroxide as the electron acceptor. Peroxidases are widely distributed throughout bacteria, fungi, 
plants, and vertebrates. In peroxidases the heme prosthetic group is protoporphyrin IX and the fifth ligand of the heme 
tron is a histidine (known as the proximal histidine). Another histidine residue (the dtstal histidine} serves as an acid- 
ic base catalyst in the reaction between hydrogen peroxide and the enzyme The regions around these two active site 
residues are more or less conserved in a majority of peroxidases [2,3] The enzymes in which one or both of these 
regions can be found are listed below. - Yeast cytochrome c peroxidase (EC * '1.1.5) - Myeloperoxidase (EC .1.1.1.1.. 7) 
(tVtPO). MPO is found in granulocytes and monocytes and plays a major role in the oxygen-dependent microbicidal 
system of neutrophils ■■ t.aot.operoxidase (EvC _1_.__1.1__1 7) (LPO.i. 1..PO is a milk protein which acts as an antimicrobial 
45 agent. ~ Eosinophil peroxidase (EC 1.11.1.7 ) (EPO). An enzyme found in the cytoplasmic granules of eosinophils. - 
Thyroid peroxidase (EC 1 11 1 8) (TPO) TPO plays a central role in the biosynthesis of thyroid hormones It catalyzes 
the lodination and coupling of the hormonogenic tyrosines in thyrogiobuiin to yield the thyroid hormones T3 and T4 - 
Fungal ligninases Ligninase catalyzes the first step in the degradation of ligmn It depolymerizes lignin by catalyzing 
the C(alphaj-C(betaj cleavage of the propyl side chains of ligmn - Plant peroxidases (EC Plants expresses 

so a large numbers of isozymes of peroxidases Some of them play a role in cell-suberization by catalyzing the deposition 
ot the aromatic residues of suberin on the cell wall, some are expressed as a defense response toward wounding, 
others are involved in the metabolism of auxin and the biosynthesis of ligmn. - Prokaryotic catalase-peroxidases. Some 
bacterial species produce enzymes that exhibit both cataiase and broad-spectrum peroxidase activities [4], Examples 
of such enzymes are cataiase HP I from Escherichia colt (gene katG) and perA from Bacillus stearotherrnophilus 
ss [12353 Consensus pattern: [DET3-[LIVMTA]-x{2HL!VMHUVMSTAG]-[SAGHL!VMSTAG 3~H~ [STA]-|L!VMFY] [H is 
the proximal heme-bindrng ligand] - 

Consensus pattern: [SGATV]-x(3}-[LtVMA]-R-[LiVMA]-x-[FW]-H-x-[SAC] [H is an active site residue]- 
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[ 1] Dawson J.H. Science 240:433-439(1988). 

[ 2] Kimura S.. Ikeda-Sarto Pvt Proteins 3:113-120(1988). 

[ 3] Hennssat B Saloheimo M , Lavaitte & , Knowies JKC Proteins, 8'251-257< 1990} 
[ 4] Welmder K G Biochim Biophys Acta 1080 215-2200991} 

s 

[1236] 465 pfKG family of carbohydrate kinases signatures 

It has been shown [1 2.3] that the following carbohydrate and purine hnasesare. evolutionary related and can be 
grouped into a single family, which isknown [1] as the 'pfkB family': - Fructokinase (EC 2.7. 1.4) (gene scrK). - 6-phos- 
phofructokinase isozyme 2 (CO 2.7.1.11) tphosphofructohnase-2i fgene pfkB't pfkB is a minor phosphofructokmase 

10 isozyme in Escherichia coh arid is not evolutionary related to the major isozyme {gene ptkAl Plants. 6-phosphofn.K- 
tokinase also belong to this family - Ribohnase (EC 2.7.1.15} (gene rbsK) - Adenosine kinase (EC 2.7.1.20) igene 
ADK). - 2-dehydro-3-deoxygluconokinase (EC 2. 7. 1 .45) (gene: kdgK). - 1-phosphofructokmase (EC 2. 7.1. 56) t fructose 
1-phosph.ate kinase) (genefruK) ■ Inosme-guanosine Mnase(E-:0 2.7J__73t(geni?gskt ■ Tagatose-6-phosphate kinase 
i EC 2 .7. 1.1.44} iphosphotagatokinase) (gene lacC) - Escheuchia coli hypothetical protein yeiC - Escherichia coli hy- 

»5 potheticai protein yeii. - Escherichia coli hypothetical protein yhfQ. - Escherichia coli hypothetical protein yihV - Bacillus 
subtilis hypothetical protein yxdC, - Yeast hypothetical protein YJRtQ5w.AH the above kinases are proteins of from 280 
to 430 .amino acid residues that share a few region of sequence similarity Two of these tegions were selected as 
signature patterns The first pattern is based on a region rich in glycine which is located in the N-termmal section of 
these enzymes; while the second pattern is based on a conserved region m the C -terminal section. 

so [1237] Consensus pattern [AG]-G-xf0 1 HGAP3-x-N->-[STA]-xf6)-[GS]->. 9)-G- 

Consensus pattern [DN&hH p 3Tv'3-x-[SAG3(2HGD]-D-y(3)-[SAGVHAG3-[LIVMFYA]-[LIVMSTAP3 

[ 1] Wu L.-R, Reizer A.. Reizer J., Cai B.. Tomtch J.M., SaierM.H. Jr. J. Bacteriof. 173:3117-3127(1991). 
[ 23 Orchard L M D , Komberg H L Ptoc R Sec Lond B Bio! Sci 242 87-90(1 990> 
25 [ 3] Biatch G.L, Scholle R.R.. Woods D.R. Gene 95:17-23(1990). 

[1238] 458 Phospholtpase A2 active sites signatures 

Phosphoiipase A2 {EC. 3.1 ...1.4) {PA2't [1 23 is an enzyme which releases fath acids from the second carbon croup of 
glycerol PA2's are small and rigid proteins of 120 ammo-acid residues that have four to seven disulfide bonds PA2 

30 binds a cak mm ion which is required tor ad ivity T tie side chains of two conserved residues a histidine and an aspari tc 
acid participate in a 'catalytic network Many PA2's have been sequenced from snahes lizards, bees and mammals 
In the latter, there are at least four forms: pancreatic, membrane-associated as well as two less characterized forms. 
The venom of most snakes contains multiple forms of PA2 Some of them aie presynaptic neurotoxins which inhibit 
neuromuscular transmission by blocking acetylcholine release from the nerve termini Two different signature patterns 

35 were derived for PA2's The first is centered on the active site histidine and contains three cysteines involved m disulfide 
bonds. The second is centered on the active site aspartir acid and also contains three cysteines involved in disulfide 
bonds. 

[1239] Consensus pattern. C-C-x{2)-H-x(2)-C [H is the active site residue] This pattern will not detect some snake 
toxins homologous with PA2 but which have lost their catalytic activity as well as otocomn-22. a Xenopus protein from 
40 the aragonitic otoconia which is also unlikely to be enzymatieally active. 

Consensus pattern [LlVtVtAj-C^LIVMFYWPCST}-C-D-<{5)-C [D is the active site residue-] The- majority of functional 
and non-functional PA2's. Undetected sequences are bee PA2, giia monster PA2's. PA2 PL-X from habu and PA2 PA- 
5 from mutga. 

45 [1] Davidson PP., Dennis E.A, J. Mo!. Evol. 31:228-238(1 990). 

[2] Gomez R, Vandermeers A.. Vandermeers-Piret M.-C.. Herzog R.. Rathe J., Stievenart M.. Winand J.. Chris- 
tophe J. Eur. J. Biochem. 186:23-33(1 989), 

[1240] 457. Phosphorylase pyridoxal-phosphate attachment site Phosphorylases {EC 2.4.1 ...1 .} [1 ] are important al- 
so lostenc enzymes in carbohydrate metabolism. They catalyze the formation of glucose 1-phosphatefrom polyglucose 
such as glycogen, starch or maltodextrin. Enzymes from different sources differ in their regulatory mechanisms and 
their natural substrates However, ai! known phosphorylases share catalytic and structural properties. They are pyri- 
doxal-phosphate dependent enzymes, the pyndoxal-P group is attached to a lysine residue around which the sequence 
is highly conserved and can be used as a signature pattern to detect this class of enzymes 
ss [1241] Consensus pattern E-A-[SC]-G-x-[GS]-x-M-K-x(2HLM]-N [K is the pyndoxal-P attachment site 3- 
[ 1] Fukui T., Shimomura S., Nakano K. Mo!. Cell. Biochem. 42:129-144(1982). 
[1242] 458. Protein kinases signatures and profile 

Eukaryotio protein kinases [1 to 53 are enzymes that belong to a very extensive family of proteins which share a eon- 
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served catalytic core common toboth serine/threonine and tyrosine protein kinases. There are a number ofconserved 
region? in the catalytic domain of protein kinases. Two of these regions wens selected to build signature patterns The 
first region, which is located in the N-terminal extremity of the catalytic domain, ts a glyetrie-rieh stietch of residues, in 
the vicinity of a lysine residue, which has been shown to be involved in ATP binding. The second region, which is 
s located in the central part of the catalytic domain, contains a conserved aspartic acid residue which is important for 
the catalytic activity of the enzyme [6], Two signature patterns were derived for that region, one specific for serine/ 
threonine kinases and the other for tyrosine kinases A profile was also developed which is based on the alignment in 
[1] and covers the entire catalytic domain. 

[1243] Consensus pattern. [t.lV'i-0-{P}<^{P^jf : VVVMGSTNHi-j;SGAi-{PVV}-jL.IVGATi.{PD}-y.- jGSTACUVWY |-x 
10 {5,18HLlVMFVVVCSTARj-[AlVPHLlVMFAGCKR]-K [K binds ATP] The majority of known protein kinases belong to 
the class delected by this pattern, but it fails to find a number of them, especially vifa! kinases which are quite divergent 
in this region and aie completely missed by this pattern 

Consensus pattern; [L!VMFYC]"X-[HY]-x-0-{LIVMFY3"K-x(2)-N-[LIVMFYCT3(3) [D is an active site residue]. Most ser- 
ine/threonine specific protein kinases belong to this class detected by the pattern with 10 exceptions (half ot them viral 

»5 kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which have respectively Ser and Arg instead of the 
conserved Lys and which are therefore detected by the tyrosine kinase specific pattern described below. 
[1244] Consensus pattern. lLi\^F-YC}-x--|H/}-x~f>4LJVMF : Y3-|RSTAC)-)i(2)--N"|L.IVMf : YCK3) |D is an active site res- 
idue] ALL tyrosine specific protein kinases with the exception of human ERBB3 and mouse blk belong to this class 
detected by the pattern. This pattern will also detect most bacterial aminoglycoside phosphotransferases [8,9] and 

so herpesviruses gangciclovir kinases [10]; which are proteins structurally and evolutionary related to protein kinases. 
This profile also detects receptor guanylate cyclases and 2-5A-dependent ribonucieases. Sequence similarities be- 
tween these too families and the eukaryotic protein kinase family have been noticed before it also detects Arabrdopsis 
thaliana kinase- like protein TMKL1 which seems to have lost its catalytic activity. If a protein analyzed includes the 
two protein kinase signatures the probability of it being a protein kinase is close to 100%. Eukary otic-type protein 

25 kinases have also been found in prokaryotes such as Myxococcus vanthus [11] and Yersinia pseudotuberculosis 

[ 1] Hanks S.K, Hunter T. FASEB J. 9 576-596(1995). 
[2] Hunter T. Meth. Enzymoi. 200:3-37{1991 ). 
[ 3] Hanks S.K., Qumn A.M. Meth, Enzymol. 200:38-62(1991). 
30 [ 4] Hanks S.K. Curr. Opin. Struct. Biol. 1 :369-383(1991). 

[ 5] Hanks 3 K., Quinn A M : Hunter T Science 241:42-52(1988) 

£6) Knighton DR. Zheng J., Ten Eyck L.F., Ashford V.A., Xuong N.-H„ Taylor S S. ; Sowadski J.M. Science 253: 
407-414(1991). 

[ 7] Bairoch A., Ciaverie J.-M. Nature 331:22(1988). 
35 [ Sj BennerS. Nature 329:21-21(1987) 

[ 9] Kirby R, J, Moi. Evol, 30:489-482(1992). 

[10] Littler E . Stuart A D., Chee M S. Nature 358 160-182(1992). 

[11] Munoz-Dorado J., Inouye S„ inouye M. Cell 67:995-1006(1991 ). 

40 [1245] Receptor tyrosine kinase class II signature 

A number of growth factois stimulate rnttogenesis by interacting with a famtlyof cell suiface receptors which possess 
an intrinsic, iigand-sensitrVe. protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)al! share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmic kinase do- 
main However they can be classified into at least five groups. The prototype for class II RTK's is the insulin receptor, 

45 a heierotetfamer of two alpha and two beta chains linked by disulfide bonds The alpha and beta chains are cleavage 
products of a precursor molecule. The alpha chain contains the iigand binding site, the beta chain transverses the 
membrane and contains the tyrosine protein kinase domain. The receptors currently known to belong to class li are; 
- Insulin receptor from vertebrates, - Insulin growth factor I receptor from mammals, - Insulin receptor-related receptor 
(!RR), which is most probably a receptor for a peptide belonging to the insulin family, - insects insulin-like receptors, - 

so Mofiuscan insulin-related peptide(s) receptor (MIP-R). - Insulin-like peptide receptor from Branchiostoma lanceolatum 
■• The Drosophila developmental protein sevenless, a putative receptor for positional information required for the for- 
mation otthe R7 photoreceptor cells. - The trk family of receptors (NTRK1. MTRK2 and NTRK3). which are high affinity 
receptors for nerve growth factor and related neurotrophic factors (BDNF and NT-3).And the following uncharacterized 
receptors - ROS. - LTK (TYK1 } - EDDR1 (cak. TRKE. RTK8) - NTRK3 (Tyrol 0. TKT). - A sponge putative receptor 

55 tyrosine kinase. While only the insulin and the insulin growth factor I receptors are known to exist in the tetrameric 
conformation specific to class II RTK's, all the above proteins share extensive homologies m their kinase domain, 
especially around the putative site of autophosphorylation Hence, a signature pattern was developed for this class of 
RTK's. which includes the tyrosine residue, itself probably autophosphoiylated 
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[1246] Consensus pattern. [DNHUV3-Y-x{3)~Y-Y-R [The second Y is the autophosphorylation site] 
[1247] f 1] Yarden Y . Ullrich A Annu Rev. Biochem. 57:44 3-478(1 988). 
[124S] Receptor tyrosine kinase class ill signature 

A number of growth factors stimulate mitogenesis by interacting with a family of cell surface receptors which possess 
s an intrinsic, ligand-sensitive. protein tyrosine kinase activity [1] These receptor tyrosine kinases (RTKjal! share the 
same topology an extracellular ligand-hmding domain, a single transmembrane region and a cytoplasmic kinase do- 
main. However they can be classified into at least five groups. The class li! RTK's are characterized by the presence 
of five to seven immunogiobt.iiin~like domains [2] in their extracellular section. Their kinase domain differs from that of 
other RTK's by the insertion of a stretch of 70 to 100 hydrophiiic residues in the middle ofthts domain. The receptors 
to currently known to belong to class 111 are - Pi ate let-derived growth factor receptor (PDGF-R) PDGF-R exists as a 
homo- or heterodimer of two related chains: alpha and beta [3]. - Macrophage colony stimulating factor receptor (CSF- 
t-Rt (also known as the fms oncogene). - Stem cell factor (mast cell growth factor') receptor (also known as the kit 
oncogene). ■ Vascular endothelial growth factor (VEGF i receptors f : lt-1 and f-'lk-1/KE)R [4]. ■ R cytokine receptor f-'lk- 
2/Flt-3 1 5] - The putative receptor Flt-4 |7] a signature pattern Was developed for this class of RTK's which is based 
*5 on a conserved region in the kinase domain. 

[1249] Consensus pattern: G-x-H-x-N-[L!VM]-V-N-L-L-G-A-C-T- 

[ 1j Yarden Y., Ullrich A. Annu. Rev. Biochem. 57:443-478(1988}. 
[2] HunkapillerT., Hood L. Adv. Immunol. 44:1-63(1989). 
20 [ 3] Lee K -H , Bowen-Pope D.F . Reed R R Mol Cell Biol 10'2237-2246i 1990) 

[A] Termart 8. 1,. Dougher-Vermazen M., Carrion M.E., Dimitrov D.. Armellino D.C., Gospodarowicz D., Boehlen 
P Biochem Biophys Res Commun. 187- 1579-1 586(1992). 

[ 5] Lyman S.D., James L. VandenBosT, de VnesP., BraselK., GliniakB., Hollingsworth L.T., PichaK.S., McKenna 
H.J., Spiett R.R. Cell 15:11.57-1 167(1 9931. 
25 [6] Galland F., Karamysheva A., Pebusque M.J., Borg J. P., Rottapel R., Dubreuii P., Rosnet O., Birnbaum O. 

Oncogene 8: 1233-1 240(1 993). 

[1250] Receptor tyrosine hnase class V signatures 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 

30 an intrinsic, ligand-sensitive. protein tyrosine kinase activity [1] 'These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmid kinase do- 
main. However they can be classified into at least five groups on the basis of sequence similarities. The extracellular 
domain of class V RTK's consist of a region of about SOOammo acids, amongst which 16 conserved cysteines probably 
involved in disulfide bonds: this region is followed by two copies of a fibronectin typeill domain The ligands for these 

35 receptors are proteins of about 200 to 3Q0resitfu.es collectively known as Ephrins. The receptors currently known to 
belong to class V are (2,3,E1|: ■ EPHA1 (Eph-T E0.sk). ■■ C--PHA2 fEck; Mpk-5, Sek-2>. ■■ Ev.PHAS (EtM; Hek; Mek4; 
Tyro4: Rek4; Cek4). - EPHA4 (Sek; Hek8; Mpk-3; CekS). - EPHA5 (Ehk-1: Hek7; Bsk; Cek7). - EPHA6 (Ehk-2). - 
EPHA7 (Ehk-3; Hek11; Mdk-1; Ebk). - EPHA8 (Eek). ~ EPHB1 (Eph-2; Elk: Net). - EPHB2 {Eph-3; Hek5; Drt; Erk; Nuk; 
Sek-3; CekS: QekS). - EPHB3 (Hek-2; Mdk-5) - EPHB4 {Htk; Mdk-2; Myk-1 ). - EPHB5 (CekQ).The EPHA subtype 

40 receptors bind to GPl-anchored ephrins while the EPHB subtype receptors bind to type-l membrane ephrins. Two 
signature patterns were developed fortius class of RTK's. which each include some of the conserved cysteine residues 
[12513 Consensus pattern: F~x^DN3~x~[GAV\^GA)-C-(LW 
[PSAWJ [The two C's are probably involved in disulfide bonds] 
Consensus pattern C>x<2)-|E)EKHDEQ]-W-^ 

45 probably involved in disulfide bonds] 

[ 1] Yarden Y., Ulirich A. Annu Rev. Biochem. 57:443-478(1988), 

[2] Sajjadi F.G.. Pasquale E B.. Subramani & New Biol 3 769-778(1991) 

[ 3j Wicks I.P., Wilkinson D.< Salvaris E., Boyd A.W. Proc. Natl. Acad. Sci. U.S.A. 89:1811-1615(1992). 

[12523 459. Protein kinase C terminal domain 
[12533 4§0. Plant thionins signature 

Thionins are small, basic, plant proteins generally toxic to animal cells [Ij.They seem to exert their toxic effect at the 
level of the cell membrane but their exact function is not known. They consist of a polypeptide chain of forty five to fifty 
55 amino acids with three to four interna! disulfide bonds. They are found in seeds but also in the cell wail of leaves [2] 
Thionins are processed from larger precursor proteins [3], Crambin [4], a hydrophobic piantseed protein, also belongs 
to this family. The pattern to detect this family of proteins includes three of the six cysteine residues involved in disulfide 
bonds . + _ +|+ 
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\aCCay^\ a v\*y v C\ a v\ay^\Cv\*C^Cay^\Cv\*y^\ a + c conset\ed cysteine in\ol\ed in 

a disulfide bond.'*': position of the pattern. 

[1254] uon^nsus pjitern C-C-xf 5VR-v(2t-[FY}->i2|-^ [Tht- thr.it- C's ate involved in disulfide bonds,] Th-i proteins, 
fiom the gamma-tnionin family are not i elated to the above pioteins and ar^ desjubed in a separate s^tion 

s 

Mj^einonLP E-.wtt G E 2**11* R D , Gwy V\ R Atch gicxhtm Bioplns tf3-2P(1985i 

[ 2] Euhlmann H Cljust-n S Bt-hnk-* S Gk« H HiiW C Ptimann-Phillip U <Mirader G E irkholt \ ^p u | 

K. EMBO J. 7: 1559-1 565(1 988'). 

[ 3] Bohlmann H., Ape! K. Mot. Gen. Genet, 207:446-454(1 987V 
10 { 4) Teeter MM.. Mazer J. A.. L'ltaiien J.J. Biochemistry 20:5437-5443(19811 

[1255] 4o1 Po^ien/I ^nthcrtas^b signatures 

A vanet\ of isoprent io compounds are synthesized by v .tnous onanisms f-'oi e*amr. le in eukaryotes the isoprent id 
biosyntheti: patn^ay is responsiole f^t the synthesis of a variety ot end pioducts including cholesterol dolichol tibiq- 

*5 uincne or coenzyme Q In bacteria this pathwa> leads to the synthesis of isopentenyl tPNA isoprenota quinones, and 
sugar carrier lipids. Among the enzvmes that participate in that pathway, are a number of poiypreny! synthetase en- 
zymes which tatalvze a 1 4 condensation between b *.atbon isoprene units Cuirently the sequence of some tfthese 
enzymes is kno^n - Eukanj otic fames'* I pyrophosphate synthetase (FPP synthetase^ \ EC 2 5 1 1/EC25 1 lOjwhich 
catalyzes the sequential condensation or isopentenyl pyrophosphate (iPPj with dunethylally! pyrophosphate (DMAPP). 

so and then with ttu :i resuttanl jt-tany! pyruf hcbf. hatt- to fctm fatne^yl pytophosphate FFP s> nth-itas-i is a cytoplasmic 
dimeric enzyme - Pn4,jt i ( oti; farnesyl pyroph^sp hate syntnetase (gwie ispA' - Prokaryotic ortapieny! diphosphate 
synthase (gene ispB) - Piokaryotic heptaprenyi diphosphate synthase (EC 2 5 1 30} - Eukaryotic geranylgetanyl py- 
lophosphate s\nthetase tGt->PP sxnthetabei (EC "611/ EC 2 b t 10 / EC 1 t I z9i which catjiyzes the sequential 
addition of ihethit-f molecules oMPP onto DM-*PP to form g<= t;<nyigeranv! pyiophosphate In plants GGPP SV nth ist- 

25 is a chloroplast enzyme involved in the biosynthesis of terpenoids: m fungi, such as Neurospora crassa (gene a 1-3). 
this enzyme is involved in the biosynthesis of carotenoids ■ Piokai>otic GGPP synthetase which are invoked m the 
biosynlhesis of catotenoids (gene cttEs Such an en,r\me is- also enuxied in the ovanelle genome of Cyanophora 
pauidova - Euk<iyottc huvaptt-ny! pyrophosph lit- bynthetabe which is irwok-ed in tht biosynthesis of coen^ynm Q 
and wntcn catalyzes tne fotrnation of all trans- porjorenyl pyrophosphates geneially ranging in lenyth of oet^een 6 

30 and 10 iscpr^nt; units dt;( ending on the h\ eues HP synthetase ib a mitochondrial nwmbian-i-asscoiattja -in^ym-i It 
has been shown (1 to 5] that all the abo^e enzymes shaie some regions of sequen:e similarity Two otth^s* i eg ions 
are rich in aspartic-acid residues and could oe involved in the catalytic mechanism and/or the binding of tne substrates 
bi^natuie patterns wwe de\ eloped foi L oth i^gic nb P<. s^ible addttw. nal inembtib ofthis family of protf tnt, arc - B^cillub 
i u Mi lis spt te germination f totem C'< (.ctene geiC^} Both piotems ate most piobat ly also enzymes involved in isopie 

35 noid metabolism [6], 

[1256] C onsensus pattern [LIVM3(::)-^D-D-^2 4)-D-x t 4V P P j'.'Hj- 
Consensus pattern: [LiVMFY]-G-x(2)-[FYL3-Q-[LiVM]-x-D-D-[LiVMFY3-x-[DNG3 

[ 13 Ashby M.N.. Edwards PA. J. Biol. Chem. 265:13157-13164(1990). 
40 [23 Fujisaki S., Hara H., Nishimura Y, Horiuchi K., Nishino T. J. Biochem. 108:995-1000(1990). 

[ 33 Caratfoh A Romano U Bafianc P Moielli G Maanc (.-, J Biol Chem 2^6 oy'yS-o^OtJ^I} 
[4j Kuntz M,. RoemerS.. Sulre C. Hugueney P. Wei! J.H., Schantz R.. Camara B. Plant J. 2:25-34(1992). 
[J-jMathSK Heaist J t. , PouiterC D Proo Natl A<.ad Set USA 8& fr "bM-P/S^I&^'t 
[ c] Bairoch A Unputlished otsep ations t t9P3) 

45 

[1257] 452. Potato inhibitor I family signature 

The potato inhibitor I family is one of the numerous families of serine proteinase inhibitors Members of this protein 
famih are toum in plants in the seeos otbaiiev or beuns [ f 2 33 ,n potato or torn .it^ leases whet e they accumulate 
in response to mechanical damage [4 3] An inhibitor belonging to this family is also found in leecn [6] it is interesting 
so to note that, currently, this is the only proteinase inhibitor family to be found both mplant and animal kingdoms. Struc- 
tutally these inhibitor etre small (60 to 90 iesidu«s) and in conttast with othet families ot ptoteasy inhibitois : they lack 
disulfide bonds They have a single inhibitory site. The consensus pattern includes three out of the four residues con- 
sewed in all members of this family and is located in the N-terminal half. 

Consensus pattern: [FYW3-P-[EOI-t]-[LIV](2)-G-x(2)-[STAGV]-x(2)-A- Barley subtilisin-chymotrypsin mhibitor-2b has 
55 Glu instead of Gly There is a trypsin inhibitor from the cucurbilaceae Momordica charantia [7], which is said to belong 
to the potato inhibitor ! family but which shows only a very weak similarity with the other members of this family 

[ 1] Svc-ndsen L Hejgaard J., Chavan J.K. Carisbsrg Res. Commun. 49:493-502(1 984). 
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[ 2) Svendsen I . Boisen S . Hejgaard J Carlsberg Res Commun. 47.45-53(1982). 

[ 3] Nozawa H., Yamagata H.. Aizono Y., Yoshikawa M., Iwasaki T. J. Biochem. 106:1003-1008(1989). 

[ 4] Cleveland T E , Thornburg R W., Ryan C A Plant Mol Bio! 8:199-207(1987). 

[ 5] Lee J.S . Brown WE . Graham J S.. Pearce G , Fox £ A.. Dreher T.W . Ahem K G , Pearson G D . Ryan C.A 
s Proc Natl. Acad. Sci. USA. 83.7277-72810986} 

[6j SeemullerU . F.ulte M.. Fritz K. Strcbl A. Hoppe-Seyler's Z. Physiol. Chem 361: 184 1-1 846(1 980). 
[ 7] Zeng F-Y. Qian R.-Q., Wang Y. FE8S Lett. 234.35-38(1988) 

[1258] 463. (pp binding) Phosphopantetheine attachment site 

to Phosphopantetheine (or pantetheine 4'phosphate) is the prosthetic group of acyt earner proteins (ACP) in some mul- 
tienzyme complexes where it serves as a 'swinging arm' for the attachment of activated fatty acid and ammo-acid 
groups [1]. Phosphopantetheine is attached to a serine residue tn these proteins [2j. ACP proteins or domains have 
been found in various enzyme systems which are listed below (references are only provided for recently determined 
sequences) - Fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty acids from acetyl-CoA. 

*s malonyl-CoA and NADPH. Bacteria! and plant chloropiast FAS are composed of eight separate subunits which corre- 
spond to the different enzymatic activities: ACP is one of these polypeptides. Fungal FAS consists of two multifunctional 
protein?.. f-'ASI and FAS2, the ACP domain is located m the N-terminal section of FAS2 Vertebrate FAS consists of a 
single multifunctional enzyme, the ACP domain is located between the beta-ketoacyl reductase domain and the C- 
terminal thioesterase domain [3], - Polyketide antibiotics synthase enzyme systems, Poiyketides are secondary me- 

so tabolites produced from simple fatty acids, by microorganisms and plants ACP is one of the polypeptide components 
involved in the biosynthesis of Streptomyces polyketide antibiotics actinorhodin, curamycin, granatacin, monensin, 
oxytetracyclme and tetracenomycm C. - Bacillus subtilis putative polyketide synthases pksK. pksl and pksM which 
respectively contain three, five and one ACP domains - The multifunctional 6-methysalieylic acid synthase (MSAS) 
from Penicillium patulum. This is a multifunctional enzyme involved in the biosynthesis of a polyketide antibiotic and 

25 which contains an ACP domain in the C-terminal extremity. - Multifunctional mycocerosic acid synthase (gene mas) 
from Mycobacterium bovis. ■■ Gramicidin S synthetase 1 (gene grsA) from Bacillus brevis This enzyme catalyzes the 
first step in the biosynthesis of the cyclic antibiotic gramicidin S. - Tyrocidine synthetase I (gene tycA) from Bacillus 
brevis. The reaction earned out by tycA is identical to thai catalyzed by grsA - Gramicidin S synthetase II igene grsB) 
from Bacillus brevis. This enzyme is a multifunctional protein that activates and polymerizes proline, valine, ornithine 

30 and leucine GisB contains four ACP domains - Erythronolide synthase proteins 1. 2 and 3 tram Saccharopolyspora 
erythraea which is involved tn the biosynthesis of the polyketide antibiotic erythromicin Each of these proteins contain 
two ACP domains. - Conidia! green pigment synthase from Aspergillus nidulans. - ACV synthetase from various fungi. 
This enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin. It contains three ACP domains 
■ Enterobactin synthetase component F {gene entF) from Escherichia coli This enzyme is involved in the ATP-depend- 

35 ent activation of serine during enterobactin fenterochelin) biosynthesis. - Cyclic peptide antibiotic surfactin synthase 
subunits 1, 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains 
a single domain - HC4o>:in synthetase igene FITS 1) from Cochliobolus carbonum This enzyme synthesizes HC-toxtn. 
3 cyclic tetrapeptide HTS1 contains four ACP domains - Fungal mitochondrial ACP [9], which is part of the respiratory 
chain NADH dehydrogenase {complex I). - Rhizobium nodulation protein nodF, which probably acts as an ACP in the 

40 synthesis of the nodulation Nod factor fatty acyl chain.The sequence around the phosphopantetheine attachment site 
is conserved in all these proteins and can be used as a signature pattern A profile was also developed that spans the 
complete ACP-iike domain. 

[1259] Consensus pattern. [DEQGSTALMKRHj-fjJVMFYSTAC j-[GNQj-[LlVMFYAG]-]pN£vKHS]-S- [UVMSTj-fPC- 
FYHSTAGCPQLIVMF3-[LIVMATNHDENQGTAKRHLM|- [LfVMWSTAHLIVGSTACR3-X{2)-[LIVMFA] [S is the panteth- 
45 eme attachment site] 

[ 1 ] Concise Encyclopedia Biochemistry. Second Edition. Walter de Gruyter, Berlin New-York (1988). 
[ 2] Pugh E.L., Wakil S J J Biol Chem. 240:4727-4733(1965) 

[ 3j WJtkowski A., Rangan V.S., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochem. 198:571-579(1991). 
so [ 6] Scotti C, Piatti M . Cuzzoni A . Reran t P . Tognoni A., Grandi G , Galizzi A., Albertini A.M Gene 130 65-71 

(1993). 

[ 9] Sackmann U., Zensen R., Rohlen D„ Jahnke U., Weiss H. Eur. J. Biochem. 200:463-489(1991). 

[1260] 464 (Prenyltranst Terpene synthases signature 
55 The following enzymes catalyze mechanistically related reactions which involvethe highly complex cyclic rearrange- 
ment of squalene or its 2,3 oxide: - Lanosterol synthase {EC 5 4 99.7} (oxidosquaiene--lanosterol cyclase), which 
catalyzes the cyclization of (S)-2,3-epoxysqualene to lanosterol, the initial precursor of cholesterol, steroid hormones 
and vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cycloartenol synthase (EC 5.4 99 8) (2,3-epo<- 
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ysqualene--cyc!oartenol cyclase;, a plant enzyme that catalyzes the cyclization of (S;-2,3- epoxysquaiene to cycloar- 
tenol - Hopene synthase (EC 5.4.99 -) (squalene-hopene cyclase), a bacterial enzyme that catalyzes the cyclization 
of squalene into hopene, a key step in hopanoid (tnterpenoid) metabolism These enzymes are evolutionary related [1 j 
proteins of about 70 to 85 Kd As a signature pattern, a highly conserved region was selected which is rich in aromatic 
s residues and which is located in the C-terminal section 

[1261] Consensus pattern EDkJ-G-S-W-x-G-x-W-IGAHLIVM j-x-lF-Y|~<-Y-jGA] 

[1262] [ 1] Corey E J.. Matsuda S.PT : Barrel B Proc. Natl. Acad Set U S.A 90 11628-11632(1993). 
[1263] 465. Prron protein signatures 

Prion protein (PrP) [1,2,3] is a small glycoprotein found in high quantity in the brains of humans or animals infected 
to with a number of degenerative neurological diseases such as Kuru, Creut2fekft- Jacob disease (CJD), scrapie or bovine 
spongiform encephalopathy (BSE). PrP is encoded in the host genome and expressed both in normal and infected 
cells It has a tendency to aggregate yielding polymers called rods. Structurally, PrP is a protein consisting of a signal 
peptide, followed by an N-terminal domain that contains tandem repeats of a short motif (PHGGGWGGin mammals, 
PHNPGY in chicken), itself followed by a highly conserved domain lly comes a C -terminal hydrophobic domain post- 
*s trartslationaliy removed when PrP is attaebedto the extracellular side of the cell membrane by a GPI-anchor: The 

structureof PrP is shown in the following schematic representation: +— + + _**™* **** — + — 

+ [Sigj Tandem repeats j € C Sjj + ■>■ j j [ + ■>■ -f + [ GPl'C'- conserved 

cysteine involved in a disulfide bond.'*': position of the patterns. As signature pattern for PrP, a perfectly conserved 
alanine- and glycme-nch region of 16 residues was selected as well as a region centered on the second cysteine 
so involved in the disulfide bond. 

[1264] Consensus pattern A-G-A-A-.A-A-G-A-V-V-G-G-L-G-G-Y- 

Consensus pattern: E-x-[ED}-x-K-[LIVMK2)-x-{KRHLIVMK2)-x-{QE}-M-C-x(2)- Q-Y [C is Involved in a disulfide bond] 

[ 1] Stahl N., Prusiner S.B. PASE8 J. 5:2739-2807(1991). 
25 [2] Bainori M„ Chiara Silvestrini M., Pocchiari M. Trends Biochem. Sci. 13:309-313(1988). 

[ 3] Prusiner S B. Annu, Rev Microbiol. 43:345-374(1889), 

[1265] 466 Cyclophilin-type pephdyl-proly! cis-trans isomerase signature and profile (pro isomerase) 

Cyclophilin [1] is the major high-affinity binding protein in vertebrates for the immunosuppressive drug cyclosporin A 

30 (CSA) It exhibits a peptidyl- prolyl cts-trans isomerase activity (EC 5 2 18) (PPIase or rctamasei PPIase is an enzyme 
that accelerates protein folding by catalyzing the cis-transisomerization of proline imidic. peptide bonds in oligopeptides 
[2]. It is probable that CSA mediates some of its effects via an inhibitory action on PPIase. Cyclophilin is a cytosolic 
protein which belongs to a family [3,4,5]that also includes the following isozymes: - Cyclophilin B (or S-cyclophilin), a 
PPIase which is retained in an endoplasmic reticulum compartment - Cyclophilin C, a cytoplasmic PPiase. - Mitochon- 

35 dnal matrix cyclophilin (cyp3). - A PPIase which seems specific for the folding of rhodopsm and is an integral membrane 
protein anchored by a C-terminal transmembrane region This protein was first characterized in Orosophila (gene 
ninaA) - Bacterial periplasms PPiase (gene ppiA). - Bacterial cytosolic PPiase (gene ppiB} - Natural-killer cell cyclo- 
philin-related protein. This large protein {about 160 Kd) isa component of a putative tumor-recognition complex involved 
in the function of NK cells. It contains a cyclophilin-type PPiase domain. - Mammalian nucleoporin Nup358 [8], a nuclear 

40 pore complex protein of 358 Kd that contains a C -terminal cyclophilin-type PPiase domain. - Yeast hypothetical protein 
YJR032w - Fission yeast hypothetical protein SpAC21ET! 05c - Caenorhabditts elegans hypothetical protein 
T27D1. tThe sequences of the different forms of cyclophilin-type PPlases are well conserved. As a signature pattern, 
a conserved region was selected in the central part of these enzymes. 

[1266] Consensus pattern: jPYHi^l-fSTChiLVl-x-F-H-fRHKlJV'MNl-jLIVMHi^l-f-- (!.rv'M]-x-0-|AG]-G- PK!3P's : a 
45 family of proteins that bind the immunosuppressive drug FK5Q8, are also PPlases, but their sequence is not at all 
related to that of cyclophilin. 

[ 1] Stamnes M.A . Rutherford S L , Zuher C S. Trends Cell Biol. 2:272-276(1992) 
[2] Fischer G., Schmid F.X Biochemistry 29:2205-2212(1990). 
so [ 3] Trandinh C.C., Pao G.M.. Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 

[ 4] Galaf A. Eur J. Biochem 216:689-707(1993) 
[ 5] Hacker J . Fischer G Mol Microbiol. 10.445456(1993) 

[63 Wu J., Matunis M.J.. Kraemer D., Biobei G., Coutavas E. J. Biol Chem. 270:14209-14213(1995). 

55 [1267] 467. Profilin signature 

Profilin [1,2] is a small eukaryotic protein that binds to monomenc actin(G-actin) in a 1:1 ratio thus preventing the 
polymerization of actin into filaments (F-actin). It can also, in certain circumstance promotes actin polymerization. 
Profilin also binds to polyphosphoinositides such as PIP2 Overall sequence similarity among profilin from organisms 
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which belong to different phyla (ranging from fungi to mammais! is low. but the N-termma! region is relatively well 
conserved That region is thought to be involved inthe binding to actin The signature pattern for profilm is based on 
conserved residues at the N-iermina! extremity A protein structurally similar to prottlm is present in the genome of 
variola and vaccinia viruses (gene A42R). 
s [1268] Consensus pattern <*(Q 1 !-[STA}-x{0, 1}-W-[DENOHJ-x-[Y!]-^[DEO] 

[ 1] Haarer B.K., Brown S.S. Cell Motif Cytoskeleton 17:71-74(1990). 
[ 2] Sohn R.H . Goidschmtdt-Clermont P BioEssays 16 465-472i 139-1 ! 

10 [1269] 468. Protamine P1 signature 

Protamines ate strull highly basic proteins, thai substitute for hisiones in spetrn ohronutin during the haplotd phase 
of spermatogenesis They pack sperm E>NA into a highly condensed, stable and inactive complex There are two 
different types of mammalian protamine called PI and P2 P1 has been found in all species studied, while P2 is 
sometimes absent Theie seems to be a single type of avian ptotamine *'hose sequence is closely related to that of 

»5 mammalian PI [1] As a signature for this family of proteins, a conserved region was selected at the N~termma! extremity 
of the sequence. 

[1270] Consensus pattern. [AV>R"[Nr : Y E-R-k<2 3>-|STJ- <-S-^ -S- 

[1271] { 1] Oliva R Goren R , Dixon GH J Biol Chem 264 17627-17630(198?) 
[1272] 469 Sperm histone P2 (protamine P2) 
so This protein also known as protamine P2 can substitute for histories in the chromatin of spenn The alignment contains 
both the sequence of the mature P2 protein and its propeptide. 
[1273] 470 Ptoteasome A-type sub units signature 

The proteasome (or macropain) (EC 3 4 99 46) 1 1 to 5,tv.1 ] is an eukaryotic and archaebactenal multicatalytic proteinase 
complex that seerns to be invoked man AT P/ubtguitin-de pendent nonlysosomal proteolytic pjthwj^ in euKaryoles the 

25 proteascme is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (203 ring) of 
about 700 Kd Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A 
and B Subunits that belong to the A-type group are proteins of fiom 210 to 290 amino acids J hat share a number of 
conserved sequence regions Subunits that are known to belong to this family are listed below - Vertebrate subunits 
C2 (nil). C3 C8. C9. iota and zeta - Drosophiia PROS-25, PROS-2S 1. PROS-29 and PROS-35 - Yeast CI (PRS11 

30 C5 iPRS'3), C, "'-alpha (Y8) . PRS2}. V Y I '3 PRE5, PRE6 and PUP2 - Arabidopsis thaliana subunits alpha and PSM30 
- Therrnoplasma acidophilum alpha-subuntt. In this archaebactena the proteasome is composed of only two different 
subunits As a signature pattern for proteasome A-type subunits the best conserved region was selected, which is 
located in the M-termina! part of these proteins. 

[1274] Consensus pattern. (FY}-> ; (4}-[STNV3-:<-{FYWj-S-P-x-G-jRKHj-x(2)-Q-(!.lV'M}-SDt;.|" Y-(SAD3-:<!2)-(SAG}-. 
35 These proteins belong to family T1 in the classification of peptidases [6.E2] 

[ 1] Rivett A.J. Biochern. J. 291:1-10(1993). 
[ 2] Rivett A J. Arch Biochern Biophys. 268 1-8(1989) 
[ 3] Goidberg A. L. . Rock K L Nature 357:375-379(1992). 
40 [ 4J Wiik S. Enzyme Protein 47:187-188(1993). 

[ 53 Hilt W., Woff DM. Trends Biochern. Set. 21:96-102(1996). 
[6j Rawlings N.D.. Barrett A J. Meth. Enzymoi. 244.19-61(1994) 

[1275] Proteasome B-type subunits signature 

<fs The proteasome (or macropain) (EC 3.4.99.46 ) {1 to 5,E1j is an eukaryotic and archaebacterial multicatalytic proteinase 
complex that seems to be involved in an ATP/ubiquitm-dependent nonlysosomal proteolytic pathway. In eukaryotes 
the proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) 
of about 700 Kd Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, 
A and B. Subunits that belong to the B-type group are proteins of from 190 to 290 amino acids that share a number of 

so conserved sequence regions Subunits that are known to belong to this family are listed below - Vertebrate subunits 
C5, beia : delta, epsilon,~theta (G10-11), LMP2/RING12, C13 (LMP77RING10), C7-I and MECL-1. - Yeast PRE1, PRE2 
(PRG1). PRE3, PRE4. PPS3, PUP1 and PUP 3. - Drosophiia L(3)73Al. - Fission yeast pts1 - Therrnoplasma acido- 
philum beta-subunit. In this archaebacteria the proteasome is composed of oniy two different subunits. As a signature 
pattern for proteasome B-type subunits the best conserved region was selected, which is located in the N-tenninal part 

ss of these proteins. 

[1276] Consensus pattern. [LIVMAHGSAHLlVMF]-x-j;FYLVGAC3-x(2HGSACFY3-ELiVMSTAC]f3)-[GAC3- 
fGSTACVHOES]-x(l5!-[RKH(t2.13}-G-y(2}-EGSTA3-D- These proteins belong to family Tt in the classification of 
peptidases [6.E2]. 
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[ ijRivettA J Biocnem J ,01 i-tO{1Ctoi 
[2jRivettAJ fiich Biochem Biophys 2C9 1-tK l^S^I 
[ Goldbtig A L Rock K L Nature 35 v r 5-379(1 9^2 1 
[ 4] Wilk S Enzyme Protein 4~ 187 -1881,1993) 
s [ 5] Hilt W.. Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[6]Rawlina.sND BanettA I Meth Enzvmol 244 W-61(1S*4i 

[1277] 171 (pyr recfo/0 P>ndine nucleotide-aisulpntde o^idoreductases class-i active site 

The pyridine nucieotide-disuiphide oxidoreductases are FaD flavoproteins which contains a pair of redox-active 

to cysteines involved m the Uan^Ter ot reducing equivalents tfcni the FAD cefaclor to the substrate On the basis of 
stjquf nc>» arid simc-tuial similarities [1] these e-nzymt^ can h« classified into two categories The first category groups 
toyethei the following encymeb [2 to o] - Glutathione ieductasf ^EC 1 6 4 2) (GR). - Higher eukaryotes thiotedoxin 
rvduotaiP (t-X 1 6 4 In 'fry pant thione r<rdtkta::.e (EC 16 4 8) l.ipoamtde dehydrogenase (EC _1_ 8 J 4). the E3 
component of alpha-hetoacid dehydtogenus^ completes - Mertunc reductase (EC 1J6J_J_ ) The sequence around 

*s the two cysteines involved in the redox-active disulfide bond is conserved and can be used as a signature pattern. 
[1278] Consensus pattern: G-G-x-C-[LiVA]-x(2)-G-C-[LiVM]-P [The two C's form the active site disulfide bond]. In 
positions. 6 .jnd ? otthe pattern all known sequent Pi. have Asn-i val, ile)withthe exception of GR from plant chloroplasts 
and from cyanobactena which have lie-Arg [7]. 

20 [ 1] huilyan J Kribhna TS R Wong L Guer.tfvit B Pahlet a Williams C H. Jr , Model P Mature 352 172-174 

(19911 

[2j Rice DW SchutzGE Guest J R J Moi Bioi 174 48c-496« i3M> 
[ 3] Brown N.L Trends Biochem. Sci. 10:400-402(1985). 

[ 4] CaEoth> j rs D J PonsG PiWMi Arch Bioehem Biophys 208 40O-425. 10891 
25 [ sj Walsh C.T. Bradley M., Nadeau K. Trends Biochem. Sci, 16:305-309(1991 ). 

t^j Gasdaska P < , Gasdaska J R Cochian S Powis C H-:K left yrj, 5-^199M 

[ 7 j Cteissen G FduardsEA Enaid C U'ellbumA Mullmeau> P Plant I 2 12V-1 31(1^1 1 

[1279] 472 (pyndoxal dec ) DDl ' GAD / HDC < TytDC c\ ndovai-phospnate attachment bite ^pyndo*ai deCi 
JO !hr< e difft-ierit > nzym< i, all pyndoai d> pendent dt-Cf'ibotyla^es bcom to shdtt. itgions cf s^quenc^ similarity 
[1.2. 3. 4], especially in the vicinity of the lysine residue which serves as the attachment site for the pyridoxal-phosphate 
(PLP) qroup. These enzymes are: - Giutamate decarboxylase (EC 4.1.1.15) (GAD). Catalyzes the decarboxylation of 
glutamate into the nctnoti^iisimttei GABAi'4 -immobutanoate^ Histidin.; dec^ibovla^ tEC4 t t 22|fHDC) Cata- 
Ivresthe de; tth.>«.ylation tf hiitioine to histamine fht-te ate two completely tmiel^d types of HDC thc>p that me 
PLP f'S t< 00 factor ( found in Gram n-xjalK'u hacton 1 and mammals) ;< ndthoto that contain a coval^ntlv bound pyruvoyl 
lestdu* tfouno in Giam positive bacteria) -\rumatio I amino acid decaibunlaie (£ O 4 t t 28) fDEX ) also known 
di,L oooddi Cdtbo<y!?'st.otkyokphand< cart j<>labc DDCcatalw sthod< coic j<>lfitn.(nott!yptOLhantotiyptamifu : ' 
It rflso a<ts on *> hs'dtov trvptofh^n ^nd oihydroxyphens'l^lanin^ tL ooj:3) - Tyrosine oe< aibonlas^ (EC 4 11 25) 
(T\ rDC 1 0'hich f-unverts tyrosine into tyramine a piecursoi of i e 0Quinolme alkaloid* ano\/ai ious amides These era} mes 
■f n aie coll^ctivelv known dS cjioup II dec^itH^vlai.* 1 *. [C 4] 

[1280] C jn^ensus pattt-tn fa-[LIVMf-VWj ^(3) K [L!VMf-YWG](2}-<{2) [LIVMP/W] x [OA}-v(2>-[L!\ MhYW^]-xfzV 
[RK] [K is the pyridoxal-P attachment site] 

[ I] Jackson F.R, J, Moi. Evol. 31:325-329(1990). 
■>s [ 2) Jotr-pn D R Sullivan P \A' mg ^ M <~ F^n^t^nnach^f D A Buhtt-nd^on M.E., Zahnow C.A. Proc. 

Natl *.cao 6" 737(1t«0> 

[33 SandmeierE.. Hale TL. Christen P. Eur. J. Biochem. 221:997-1002(1994). 

[4315=11118 Mcuqich H N^hm- J Havasht H Y, gamiyama H. J, Biochem, 120:369-376(1996). 

[1 23 1] 47 0 R<;guldtorotciiicmobonie conoen^^tioii tRCC1) bigndttiifez, t R> Clj 

I he r»cjuht>r of chrjmoiome .on lensatun iR"CU[ t] ii a ^ukaryttic prjt^in wiii;h btncS to chromatin and interacts 
*ith nn a nml^^r G!P binding piot-^m t<' piomote th^ loss <'f b^und GDP and th^ uptake offresh GTP. thus acting 
as a guanine-nucleotide dissociation stimulator tGQS )(2), The interaction of RCC1 with ran probably plays an important 
role in the rcgulaticn of genr oxpression RuC1 known as PRP20 or SRM1 in yeast, pirn! in fission yeast and BJ1 in 
55 Ptosophil 1 ib a protein th it contains vtn kmdc m ropoat^ of a domain of about "0 to B0 ammo acids As shown in 
thf follov ing schematic lepte^entation the receats make up the maiut part of the length of the protein. Outside the 
repeat region, there is just a small N-termmal domain of about 40 to 50 residues and, in the Drosophiia protein only, a 
C-k'-rmmal domain of about 130 residues + — + + + + + + + + + jN-t [Rpt 
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1 [Rpt 2 [Rpt 3 [Rpt 4 jRpt 5 jRpt 6 jRpt 7 j C-terminal j + — + + + + + + + 

+ + | n Drosophila two signature patterns for RCC1 were developed The first is found in the N-ierminal part 

of the second repeat: this is the most conserved part of RCC1 . The second is derived from conserved positions in the 
C-terminal part of each repeat and detects up to five copies of the repeated domain. The RCC1-type of repeat is also 
s found in the X-!inked retinitis pigmentosa GTPase regulator [3] 
[1282] Consensus pattern G-x-N-C)-x(2>-|AVj-L.-G--R--:<-T-- 
Consensus pattern: EL!VMFAHSTAGC3(2)-G~x(2)-H-[STAGLiHL!VMFA3-x-fL!VM3- 



[IjDassoM Trends Biochem Set. 01(1993) 

[ 2} Boguski M S . McCormick F Nature 366:643-654(1993) 

[ 3] Roepman R., Van Duijnhoven G , Rosenberg T ; Pinckers A J.LG. Blee>ke>r-Wagemaket s L M.. Bergen A. A 
B,. Post J., Beck A., Reinhardt R. t Ropers H.-H., Cremers F, Berger W. Hum. Mol. Genet. 5:1035-1041(1996). 



[1283] 474 RNA 3'-termmal phosphate cyclase signature (RCT) 

»5 RNA 3'-terminai phosphate cyclase (EC 6.5.1.4) [1 ,2] catalyzes the conversion of 3'-phosphate to a 2\3'~eyclic phos- 
phodiester at the end of RNA. The biological role of this enzyme is unknown but it is likely to function in some aspects 
of cellular RNA processing The reaction catalyzed by the enzyme occurs in three steps 1 1 adenylaticn of the enzyme 
by ATP; 2) the enzyme acts on RNA-3'terrninai phosphate to produce RNA-3'termina! diphosphate adenylate; 3) Re- 
lease of AMP and cvclisation by a non catalytic nucieophilic attack by the adjacent ?' hydroxy I on the phosphorus in 

so the diester linkage This enzyme, which has been characterized in human (where there seems to be at least three 
isozymes) and Escherichia coli (gene rtCA}, seems to be taxonomically widespread It is found in insects, plants, fungi 
(gene RTCi inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 to 42 Kd The best conserved region, 
which is used as a signature pattern, is a glycine-nch stretch of residues located m the central part of the sequence 
and which is reminiscent of various ATP, GTPor AMP glycine-rich loops. In this context, the conserved Arg (His in the 

25 E.coii enzyme) could be the AMP-binding residue. 

[1284] Consensus pattern [RH}-G-x{2>-P--y..G(3)"y..{LlV3- 

[ 1] Genschik P., Billy E., Swianiewicz M., Fiiipowiez W. EMBO J. 16:2955-2967(1997). 
[2] Fiiipowiez W., VincenteO. Meth. Enzymol. 181:499-510(1990). 

[1285] 475. REV protein (anti-repression trans-activator protein) 

[12863 4 76. Prokaryotic-type class 1 peptide chain release factors signature (RF-1) 

Peptide chain release factors (RFs) are required for the termination of protein biosynthesis [1] At present two classes 
of RFs can be distinguished Class I RFs bind to nbosomes that have encountered a stop codon at their decoding site 

35 and induce release of the nascent polypeptide Class II RFs are GTP-binding proteins that interact with class i RFs 
and enhance class I RF activity, in prokaryotes there are two class I RFs that act in a codon specific manner]?]" RF-1 
(gene prfA) mediates UAA and UAG-dependent termination while RF-2(gene prfB) mediates UAA and UGA-dependent 
termination RF-1 3nd RF-2 are structurally and evolutionary related proteins which have been shown [3] to make up 
a family that also contains the following proteins - Fungal MRF1, a mitochondrial RF (m-RF) which recognizes the 

40 UAA and UAG codons. - Escherichia colt RF-H, a protein of unknown function. - Escherichia coli hypothetical protein 
yaeJ and a close Pseudomonas putida homolog A highly conserved region located in the central part of the 40 to 45 
Kd RF-1/2 and m-RF and in the N-terminal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattern. 
[1287] Consensus pattern: [AR]-];S"fA|-x-G-^-G-G-0.-[HNGCS3"V-N-xf3)-jST j-A-jIV] 

Note that prokaryotic-type class I F?f : s display no significant sequence similarity to prokaryotic-type class II which belong 
45 to the family of GTP-binding elongation factors nor to eukaryotic class I or class il RFs 

[ 1]TateW.P. Poole E.S., Mannering S M. Prog Nucleic Acids. Res. Mol. Biol. S:? ::S3-335(1S96). 
[ 2] Craigen W J , Lee C C : Caskey C.T Mol. Microbiol 4.861-865(1990) 
[ 3] Pel H.J., Rep M, Griveli LA. Nucleic Acids Res. 20:4423-4426(1992). 

[1288] 477. RIO1/ZK632.3/MJ0444 family signature 

The following uncharactenzed proteins are evolutionary related [1j - Yeast protein Rid - Caenorhabditis elegans 
hypothetical protein ZK632.3. - Methanococcus jannaschu hypothetical protein MJ0444. -Thermoplasma acidophilum 
hypofhetical protein if rpc A2 3'region The eukaryotic members of this family are proteins of about 55 to 60 Kd. whiie 
55 the archebacteti : i! ones ate half that size The central part of these proteins is highly conserved The best conserved 
region is used as a signature pattern. 

[1289] Consensus pattern [LIVM3-V-H-]GAl-D-L.-S-E-fFY]-lsl-x-[LlVM3 
[1290] [ 1] Baircoh A Unpublished observations {19971 
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[1291] 47b ^RlPiShiga'nctn nbosoma! inactivating toxins acti\e site signature A number of bacterial and olanttOAins 
act by inhibiting protein ^nthebis in eukatyotic cellb The tovins ot the Shiyaand ncm temtlj m^tKate ^OS ribobomal 
subumts by : m N-glyocstdtc cl^avac*^ v\hk'h ^leasts a h\ ecifK ad^nm^ base fKtinfhe su^ar-phosphat^ backbone of 
2SS rRNA [1,2,3]. The toxins which are known to function in this manner are: - Shiga toxin from Shtqella dysentenae 

s [4] This toxin ts composed of one coo) of an enzymattcally active A subunii and five copies of a B subumt responsible 
for binding the toxin rumple* k specific receptors on the target tell surface Shiga -like to Kin* (SI. ]') .tie a ciioup <. f 
E^chenrhisi coli toxins ^ery similar in their struitutt! and ptopeditis to Shiga tovm The sequence of Ivo typ^s of the^e 
toxins 3LT-1 [5] and SLT-2 [6] ts known - Picin a potent tovtn from castor bean seeds Rictn consists of two glyco- 
sylated chains linked b\ a disulfide bond The A chain i? en~\ mattcally atWe 'I he & chain ts a lectin with a binding 

10 f f^fa'tuncf* ki galaolosidi='& Both chains an* encoded t y a single polypeptide ptetursui Rinn classified Ah : i typt- 
1! nbosome-inactivatinci protein (RIP) othti membets of this family are agglutinin also from castor bean and ihnn 
ftointhe seedsdthe bean Abt lis. precatorius [~] - Single chain iibosoine-inactixatiny protein^ (typ^-l RIPifiom dants 
E: <amplesof such proteins ate h.jiley f ictein synthesis inhibitors 1 and II mongolian snake -gourd tnchosanthin sponge 
goun luftin-Aam -B garden four-o'rlo:k MAP rommon pokebeiry PAP-S am soap^ort saponn-6 [7] All tnese to^ns 

*s are structurally related A conserved glutamic residue has been implicated [a] in the catalytic mechanism it is located 
near a eonsea-ed arginme which also plays a role m catalysis [9j. The signature that has been developed for these 
proteins includes these catalytic residues 

[1292] Consensus pattern (Liv/MA3-v-[LIVMSTAKI)-v-E-{SAGVHSTAL}-P-{FV]-[RhNOS3A- [L1VMHE^S]-m21- 
[UVMF] [E and R are active site residues]-- 

20 [1293] [1]Erido/ Tsurugi k TakedaY Og : is : iw : ita T Igarashi K Eur I Biochem 171 45-50(1984? i [ 2} Mav M 
I , Hartley M R Pooerts L M Kri^g p A Osborn P W L?ro JM EMBO J 8 301-508< 19^0 > [ 3] Funatsu G Islam 
M R Mtnami r Sung-Sil I' Kimura M Biochimie 7 C 1io7~1161(1t»y1> [ 4] Sttockbine N A Jackson M P Sung L 
M., Holmes R.K.. O'Brien A.D. J. Bacteriol. 1 70: 1116-11 22( 1 988). [ 5] Cafdeiwood S.B.. Auclair F„ Donohue-Rolfe A.. 
Y eusoh G T Mek llanos J J Ptoc Natl Acad Sci USA W 4364-43B£( 1987> [ fs] Jaci-son M P Nuill R J O'Brien 

25 AD., Holmes R.K.. Newland J.W FEMS Microbiol. Lett. 44:109-114(1987).] 7] Barbieri L, Battel!! M.G.. Stirpe F. Sio- 
chim OStophvs Acta 1164 V -2S:.(U93) j 8| Hovde C J , Calderwood S £-3 Mekalanoi, J t Collier R J Pioc Natl 
Acad Su U S A 85 2668-2672(1 9tf 8 1 [ Moronic a F Collins F J , Ernst -? R Win J D Poberlus J D I Md 
Biol. 233:705-715(19931 

[1294] 47u Bactetial RNA pohjmeiase alpha chain iRNA pol A baci 

30 Membiws of this family include.- : il( ti : i suburiit fiom eubaclt-na and aldia SLtbuntfa fr« - >m oh!of> - >pListb f tie alpha subumt 
ot PH^ polymetase consists of two tnoependenth fdied icmains tefenedtf 1 as aminf-tetmina! und tatboyvl t^tminal 
domains. The ammo terminal domain is involved in the interaction with the other subunits of the RNA polymerase. The 
carboxy (-terminal domain interacts with the DMA and activators. The amino acid sequence of the alpha subumt is 
conseh'ed in prok^n, otic and chlorophtst RNApol\ mer jses There atethr^e teutons of particularly sticiig conser ation 

35 two m the amino-termmal and one in the carboxyl-Comment: terminal [3]. 

jtJ.Hiano.G DaistSA Soience 1998 281 26.? '?66 \7] leon > H, Negisht T SlnrakawaM Yanwaki'l f-'ujitaN Ishihama 
A k>< joku V ^cit;nc<» 19<-j5 270 14<-j5-149 v [3] Ebftcjht RH Busby b r (j rr Opm t^nt-1 D>^v 1995 S 197-203 [4] Mu- 
i^k^mi K himuia M O^ens IT Me^tes <T Ishih^ma -i Pror Natl Acad Sci U^A !Q9^ Q4 1 "09-1 714 
[1295] 480 RNA poKtrtfctase beta subumt i RNA pol Bl 

40 RNA pol j mrtjses catalyst the DNA dependent polymerisation of RNA Ptokaryotes contain a single PNA polymerase 
compared to thr^e m eukaryolts (nd iiKludinj mituctiondnal arid ctiktioplast pulymtiase'.i b : ioh RNA polymerase 
c?mple>k contains two related membeis of this family in ea:h cose tney ate the two laroest suoumts [1] Falkenbutg 
D LVornic-akB Faust DM fraut? C-K J Mol Pioi W8? m 9'?9 ?s '' 
[1296] 48! RNA polymeiases H II 3 Kd subunth signature 

45 in eukaryuttis th> j re are ihr«e difftimnt forms of DN A-d> j pendent RN-» pohmera^ (EC 2 7 7 6) transcribing different 
sets of genes Each class of RNA polymeia e e is an assemblage often to tveKe diffeient polypeptides in archaebac- 
terta thete is generally a single form of RNa polvmeiase v.lnoh also <.onsii.t of an oligomenc assembiagt- of 10 to 1 3 
polypeptides Atchaehactenal subumt H (gene rpoli) [I 2] is a small protein ot ah?uf 8 *> to 10 kd it is evolutionary 
related to the C -terminal oart of a 2o Kd component shared oy all three forms of eul aryottc RNA polymerases {gene 

so PPB5 in yeast and POLR2E in mammals} At. a siynatuie pattern a conseued region was selected 'duch is located at 
theN terminal extremity <. tsubun it H this leojonot nta m stwo histid me sth.jt could piava lole inthe hmaingof a metal ion 
[1297] Consensus pattern H-[N£IHLl^M3-^-P->--H-yi2v[LiVMj-x(24DE] 

[IjKltnk H -P PfjImP Lo1tsf.eichF Zilltg Vv Proc Natl Ao=id Sci USA 39 407-4 10(1 9<-j2i 
ss [23ThiruA. HodachM.. Eloranta J.J.. Kostourou V.. Weinzierl R.O.. Matthews S.: J. Mol. Biol. 287:753-760(1999). 

[1298] 48/ RNApolymeiaset. b I 14 to 18 Kd subuntt* Piynatuie 

In eukaryoles Uu^ie : ne thr^e different fuims c f DNA-dep^nderU RNApulymtiiase'. (E<" z 7 7 ^| tianscuting different 
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sets of genes Each class of RNA polymerase is an assemblage often to twelve different polypeptides In archaebac- 
teria. there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides A component of 14 to 18 Kd shared by all three forms of eukatyotie RNA polymerases and which has 
been sequenced tn budding yeast (gene RPB6 orRP02S) : in fission yeast (gene rpb6orrpo15). in human and in African 
s swine fever viais [1j is evolutionary related [2] to archaebacterial subunit K (gene rpoKj. The archaebacterial protein 
is coltnear with the C-termina! part of the eukaryofic subunif. 
[1299] Consensus pattern EST]-y-EFY]-E-y-[AT]-R-y-ELIVM3-[GSA]-x-R-[SA]-y-Q 

E 1) Lu Z.. Kuttsh G P. , Sussman M.D., Rock D 1. Nucleic Acids Res. 21.2940-2940(1993). 
10 E 2] McKune K. ; Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

[1300] 483 RNA polymerases L 1 13 to 16 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNApolynrterases (EC 5 7.7.6) transcribing different 
sets of genes Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides In archaebac- 

'5 teria. there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides, it has been shown that small subunits of about 13 to 16 Kd found in all three types of eukaryofic polymer- 
ases are highly conserved. Subunits known to belong to this family .are. ■• Budding yeast RFC 19 subunit from RNA 
polymerases I and ill [1]. - Budding yeast RPB11 subunit from RNA polymerase il [2] - Mammalian RPB11 (gene 
POL.R2K)from RNA polymerase II. ■■Caenorhabdttts elegans hypothetical protein F58A4.&. ■■ fytethanococcus jannasrhii 

so RNA polymerase subunit L (gene rpoLi. - Sulfolobus actdocaldanus RNA polymerase subunit L (.gene rpoL) [3] As a 
signature pattern a conserved region was selected which is located at the M-termina! extiemity of these poiymetase 
subunits; this region contains two cysteines that could play a role in the binding of a metal ion. 
[1301] Consensus pattern: [DE](2}-H-[STj-[LIVM3-[GAP]-N-x(11}-V-x-[FWi3-x{2)-Y-xf3)- H-P 

25 E 1) Dequard-Chablat M, Riva IV!., Carles C, Sentenac A. J. Bio!. Chem. 266:15300-15307(1991). 

[23 Woychik N.A., McKune K., Lane W.S., Young RA Gene Expr. 3:77-82(1993), 
[ 33 Langer D. EMBL/GenBank: X70805, 

[1302] 484. RNA polymerases N / 8 Kd subunits signature 

30 in eukaryotes, there are three different forms of DNA-dependent RNA polymerases (t^ 1 , , 6) hann. ubing diffor< nf 
sets of genes Each class of RNA polymerase is an assemblage often to twelve dtffei^nt polypeptides In an h-ieo-ic 
teria. there is generally a single form of RNA polymerase which also consist of an oiigomeric assemblage of 10 to 13 
polypeptides. Archaebacterial subunit N (gene rpoN! [1J is a small protein of about 8 Kc 1 it is evolutional related [23 
to a 8 3 Kd component shared by all three forms of eukaryofic RNA polymerases (gene RPR 10 in yeast and POL P2 J 

35 tn mammals ) as well as to African swine fever virus protein CPS0R [BJ.As a signaling pattern 1 ionst.1 v^d region *t<s 
selected which is located at the N-terminal extremity of these polymerase subunits; this region contains two cysteines 
that could play a rale in the binding of a metal ion. 
[1303] Consensus pattern: [LIVMF3(2)-P-[LIVM]-x-C-F-[ST]-C-G- 

40 [ 13 Langer D.. Hain J., Thuriaux P, Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768-6772(1995). 

[23 McKune K., Woychik N.A. J. Bacterid. 176:4754-4756(1994}. 

E 3] Yanez R.J.. Rodriguez J.M., Nogal M.L, Yuste L, EnriquezC Rodriguez J. P.. Vmuela E. Virology 208:249-278 
(1995), 

■>s [1304] 465. Ribonuciease Nil 

[1] Mian IS. Nucleic Acids Res 1997.25 31 87-3 189 
[1305] 486 Ribonuciease PH signature 

Prokaryotic ribonuciease PH (EC 2.7_._7.56) (RNase PH) [1 ] is a phosphorolyticeyonbonuclease that removes nucleotide 
residues following the -CCA terminus oftRNAand adds nucleotides to the ends of RNA molecules by using nucleoside 
so diphosphates as substrates RNase PH is a conserved protein of about 240 amino-acid residues. It is evolutionary 
related to Caenorhabditis elegans hypothetical protein B0564 1 As a signature pattern, the most highly conserved 
region was selected which is located in the central part of these proteins 

Consensus sequence: C-[DE3-ELIVM3(2)-Q-EGTA3-D-G.{SG3-x(2HTA]-A 3 1] Kelly K.O., Deutscher UP. J, Biol. Chem. 
267:17153-17158(1992). 
ss [1306] 487 RanBPI domain 

[1] Di MatteoG. Fuschi P. Zerfass K. Moretti S. Ricordy R. Cenciarelli C. Tripodi M. Jansen-Durr P, Lavia P; Cell Growth 

Differ 1995;6:1213-1224. 

[1307] 488. Rhodanese signatures 
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Rhodanese (thiosuifate sulfurtransterasei (EC 2 3 1 1 > [1 2] is an enzyme which catalyzes the transfer of tne suifane 
atom of thiosuifate to cyanide, to form sulfite and thiocyanate. In vertebrates, rhodanese is a mitochondnal enzyme of 
atx ut 300 ami no- acid re sidues involved in forming iron-sulfut conif. ie<es and cyanide defo>ificatn.tn A cysteine residue 
Lit-^s fart in the catalytic mechanism Som^ ba:tenal proteins closely' lelated to rhodanese ar^ also thought to e>piess 

s a sulfotransfc rase activity These are Asotobactcr vineiandu rhdA Escherichia coli sseA [2] Saccharopolyspora 
et\thiaea cysA [4| ■ "5ynechcv.cv.cm stiam FCC ''942 ihoA [b] RhdA is a periplasms piotein pit bably involved in the 
tr*nt.port ot sulfut (.oinpounds Tv\o patterns for th> j rhodanese tamily wete cuvulupud Thuv ir« has> j ci on highly 
conserved regions, one which is located in the N-termma! region, the other at the C-termmal extremity of the enzyme. 
[1308] C onsensus pattern [¥ Yj <i3)-H-(UVj P-G A- <t.?HUVF ] 

10 Consensus ( artem [Frj-[CEAF]-G-[SAj-W->t-E-[FVW] 

[ 1] Westley J. Meth. Enzymol. 77:285-291(1981). 
[ 2] WeiLtnd K I. Dooley 7 P Biochem J ?/ r> 22i -231(19^1 > 
[ 3] PudT K E Unpublish^ •jbset^ations (19935 
*s [4] Donadio S Shafiee A.. Hutchinson O R. J. Bactenol. 172:350-360(1990). 

[ 5] Laudenbach D.E.. Ehrhardt D.. Green L. Grossman A.R. J. Bacteriol. 173:2751-2760(1991). 

[1309] 48C Ribonuclease 111 family' signature 

Prokaryotic nbonuclease ill (EC 3 1 ,26.3) (gene roc) [1] is an ens vine that digests double-stranded RNA. It is involved 
so inttn 1 pressing cfnbosomal RriApiecurscisandof some mRNAs RNase- 111 is evokHionary related [2] k> the folk wing 
ptoteins - Fission yeast pad a nr. •jnncl^ase that probably inhibits muting ano meiosis by degiaciing a speutic rnRNA 
required foi sevual development - Yeast noonuclease ill (gene PNT1 1 a dsRNA-specific nuclease that cleaves eu~ 
karvotic prenbosomal RNA at various sites. - Caenorhabditis elegans hypothetical protein F/6E4.13. - Paramecium 
hursan i <-hkvt-!!a virus, 1 protein A4B4R - Syneohocystis strain PC<~ ^£03 hvpoihett^ i! piotein slr034C - Fission y.*ast 
25 hypothetical protein SpAC8A4 08c, a protein with a N-termmal helicase domain and a C-termina! RNase II! domain. - 
Caenorhabditis elegans hypothetical protein K12M4 8 a protein vuth the same strtictuie as SpAC8A4 0c>c These pro- 
teins share- regie ns> of ssr-que-nc e similanly t.'ne of which is a highly c^n^eived shetoh of 9 tsr-siciues which has bee-n 
developed as a signature pattern. 

[1310] Consensus pattern [DEQHR^H^E-EFYVVHLV ]-G-D-[SAR]- 

[ IjNasbimotoH Uchicki H Mol Gwi Genet 201 25-29( t?85t 
[23 Mian l.S. Nucleic Acids Res. 25:31 87-31 95(1 997V 

[1311] 490. Rieske iron-sulfur protein signatures 

35 Ubiquinol-evtochrome c r> j duet ise iEC ) I0 . _i ialso known as the bd complexor complex 111) is one otthe electron 
transport chains or mitochondria and or i,ome aeiobic proKarvotes it catalyses tht- o^idoreduction of ubiquinol and 
cytoehicme c In the ohloiopkist of plants, and in c> wobacte'tia plas,k t quinonei.las,kcvanm teductase' (Ef" ! 10 99 1) 
(^KoKno^vnas the fc6fcomple> ) is functionally simitet anac3t3lvz<-sth<-oxiaoteduction of plastoquinol ano cyro< hiome 
f One of tne components of these electron transfer systems is an iron-sulfur protein yvith a 2Fe-2S cluster which is 

40 called the Rieske piotein [1 Th<= Ricbkte piotein contains apptoim=ite!i 1£0 ^mmo aud iesidu<;b Th<= iron-sulfui 
cluste i li, >'omple <od to the pic te in fhrc ugh cysltine and tii'.tidint; residue's fv\o f. orfectl) (.oristi vod r^gic ns in Hi^ske 
proteins contains a!! the residuesthat bind the iron-sulfur cluster. Both regions contain two cysteines and a histidme. 
The first cysteine and the histidme are 2Fe-2S ligands while the remaining cysteines form a disulfide bond [3], Two 
conserved regions were selected as signature patterns. 

45 [1312] CunUnsus pattern C-[Th j-H-L-G-C-[L!VSTj [Tha fir^t C and the H art 2Fe-2S lioands] jTfv second C is 
involved in a disulfide bond] 

Consensus pattern C-P C H v [GSA| [The first C and the H aie2Fe ISltftand^jrhe second C is involved in a disulfide 
bond3 

so [ 1] Gatti F.L.. Meinhardt S.W.. Ohnishi T.. Tzagoloff A. J. Mol. Biol. 205:421-435(1989). 

[Ilkallai. ! Spillet S MalkmP Pioc Nat! Ac.jd Sti U6«A 85 5794 --j'^i 1P«8) 
[ 33 Iwata S.. Savnovtts M., Link T.A.. Michel H, Structure 4:587-579(1998). 

[1313] 491. Ribosomal protein LI signature 
55 Ribosomal protein L1 is fh> j iargust ptuttiin fromthu I irgt ribosonial suhurut In Es^herKhi i (.oli L1 th kno^vntu bind to 
the'ISctRNA It oelongs to a family of riDOSomai piotems whicn on the basis of sequence similarities [ I 2] gioups 
- Fubartftia! I 1 - Algal and plant <.hk loplat-t 1 1 - Cvanelle L1 - Ar-haeL arterial I 1 - Vertebrate t 10A - >ea^t 
bSM 1 As a siqnaluie pafltin the be st oc nservt;d teqion was stl^cttid located in the >'enhal t,f»ction of ih<*h<* proteins 
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It is located at the end of an alpha helix thought to be invoked in RNA-bindtng 

[1314] Consensus pattern 3IM]-v(2HLI\, A]-v(2 3HLIVM]-G-H^t-ELM^]-[GbNH]-[PTKR3-[l- RAV]-'JJ-v-[LIMF]-P- 
[DENSTKQ] 

s [1j Nikono\' S V Nevskaya N Elisctkina 1 A Fomenhova N P Nthuiin A Ossina N GarberM Jonsson B -H 

Brhjnd C , AI-!\3rad.to,hi S Svemsfn L A Ae^rsson A , Lilian A EvMBO I 15 1360 1359)1990} 
[ 2] OU'f-r-i .I Wool I G 2 3 CO 2-"bioehem Biophys Ri"o Commun 220 954-95~(1«36) 

[1315] 492. Ribosoma! protein L.10 signature 
10 Ribosomal protein L 10 is one cf the proteins from the large nboscma! subuntt L1U is. a [.to! tin of 1^2 k \S5 airurio- 
actd tesidues which hat only bt^n found so fjf in eubaehiru A consumed teoion located in fhu N-^rmuvi! section of 
these proteins was used as a signature pattern. 

[1316] Consensus pattern [DEH]-»i?i [G&Hl.lVMf-j-lS7NMVftj->-[E)EOK] [L l\ MAJ-xt?>-[LIMJ-R 

[1317] 403 Pibosomal protein LlOe signature 
*5 A number of eukaryotic and archaebactena! nbosomal proteins can be grouped on the basis of sequence similarities 

One of these families consists of: - Vertebrate L10 (QM) [1]. - Piant L10. - Caenorhabditis eiegans L10 (F1QB5.1L - 

Ye.tst !. to tOSR1) ■ M<rthani.nwi.us jannaschit M Kf643 I'hese pioteins have 174 to 232 ammo-acid Eesidupi, A con 

sen.ed region located in the central section v\as selected as a signature pattern 

[1318] Consensus pattern P-v -A [FYW]-G K-[PA] v g^(.:> A- R- V 
20 [1]ChanY-L Dia^ .1 -J Denoiov L Madjaf J -J , Wool i G 2 3 >"0 2-"&ioehem Btophys Res Commun 255 

952-956(1996). 

[1319] 494. Ribosoma! protein L11 signature 

[1320] Pibosomal pictein Lit is one of the pterins from the latg*> nbosomal subuntt In Eschenchta colt L11 is 
known to bind directlv to the 23S rRNA. It belongs to a familv of nbosomal proteins which, on the basis of sequence 
2S similarities [1. 2} t groups: 

- EubacterialLII. 

Plans chloroplast L11 (nuclear-encoded). 
Read algal chloroplast L1 1 . 

Archaebactertai L11. 
Mammalian L12. 

- Plants LI 2. 

- Yeast L12 (YL15). 

35 

[1321] 1. 11 is a protetn of 140 to 16E ami no actd lestdue? A conseived region located in the C-teimtnal jetton of 
these proteins v\as sele^ ted as a signature pattern In Escherichia i'oIi the u-t^nninal half of L11 has bt^n &hown [3] 
to be in an extended and loosely folded conformation and is hkely to be buried within the nbosomal structure. 
[1322] Consensus pattern ERKN3-s-ELIVM3^-CHST3~\r;4cNQHLlVM]-G-Y(-; t -ELiVM3~\ t O 1 i-[DE!\1G] 

[ 1 3 Puce tar*ilt G , Rc macha M Balk sfa JPG, Muck ic Acids k<^ 18 4409-44 1N 1990) 
[2j Otaka £., Hashimoto T.. Mizuta K.. Suzuki K.; Protein Seq. Data Anal. 5:301-313(1993), 
E 33 Choli T. Biochem, Int. 19:1323-1338(1989). 

45 [1323] 49 s : Ribosoma! ptoiem L7/L12 C-tenmna! domain 

[1324] [1] Leyonmarck M Ltljas A J Mo! Biol 193" 195 555-o7& 
[1325] 496 Ribosoma! protetn 1.13 signature 

Ribosoma! protein L 13 is one of the pminns from the large nbosomal subuntt In Esih^nchta colt L13 Known to fce 
one of the early assembly proteins of the ?0S nbosomal subuntt It belongs to a famtl\ of nbosomal proteins v\htch on 
so the basts of sequence similarities [13, groups: - Eubactenal L13. 

Plant chloroplast L13 (nuclear-encoded). - Red algal chloroplast L13. 

- Archaebactertai 113. - Mammalian 11 3a (Turn P198). - Yeast Rp22 and Rp23. 

ss [1326] L11 is a protetn of 140 to 250 ainino-actd msidues At a signature pattern : i conserved region w : is selected 
located in the C-terminal section of these proteins 

[1327] Consensus pattern [I !\ MHKR\,)-[GM-M-EI fVHPS]-<^,S>-[GS3-[NQFI* RAj-^^K! IVM]-v-[AIV]-[l FY}-<- 
[GDNj 
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[1328] [ 1] Chan V ~L . Olvera J t Glueck A . Woo! i.G. J. Biol Chem. 2695589-5S94( 1994). 
[1329] 497. Ribosomai protein L13e signature 

A number of eukaryotic ribosomai proteins, can be grouped on the basis of sequence similarities One of these 
families consists of: 

s 

Vertebrate 1. 13 (was previously known as Breast Basic Conserved protein 1 (BBC!)} ■■ Drosophila 1.13 ■ Plant 
113. - Yeast probabie 113 (YM9375.11c). 

These proteins have 189 to 218 ammo-acid residues As a signature pattern, a stretch of about 16 residues in the first 
10 third of these proteins selected. 

- Consensus pattern: [KR]-Y-x{2)-K-[UVM]-R-[STA]-G-[KR}-G-F-[ST]-L-x-E 

[1330] 1 1] Olvera J , Woo! ! G Biochem Biophys Res Commun. 201 102-107(1994). 
*5 |1 3313 498. Ribosomai protein L14 signature 

Ribosomal protein L14 is one of" the proteins from the large ribosomai subunit. in eubacteria, L14 is known to bind 
directly to the 23S rRNA. it belongs to a family of ribosomal proteins which, on the basis of sequence similarities (1 j. 
groups - Eubacteriai 11 4 ~ Aigal and plant chloropiast 11 4. - Cyaneile LI 4. - Archaebactenal L14 - Yeast 11 7A. - 
Mammalian L23. 

Caenorhabditis eiegans L23 {80 336 10} - Higher eukaryotes mitochondrial L14. 

- Yeast mitochondrial Yml3S (gene MRPL38). 

L14 is a protein of 119 to 137 ammo-acid residues As a signature pattern, a conserved region located in the G-terrninal 
25 half of these proteins was selected. 

- Consensus pattern: [GA]-[LIVE{3)-x{9,10HONS]-G-x(4HFY3-x{2HNT]-x(2)-V-[LlV3 

[1332] [ 1] Otaka E . Hashimoto T, Mizuta K.. Suzuki K Protein Seq. Data Anal 5:301-313(1993) 
30 [1333] 499. Ribosomai protein L15 signature 

Ribosomai protein L16 is one of the proteins from the large ribosomai subunit. In Escherichia coil, L15 is known to bind 
the 23S rRNA. it belongs to a family of ribosomai proteins which, on the basis of sequence similarities (1), groups: - 
Eubacteriai L15. - Plant chloropiast L15 (nuclear-encoded). 

35 - Archaebactenal L15. - Vertebrate L27a. - Tetrahymena thenncphila L29. 

- Fungi L27a (128, CRP-1 , CYH2). 

L15 is a protein of 144 to 154 amino-acid residues As 3 signature pattern, 3 conserved region W3S selected in the C- 
terminal section of these proteins. 

- Consensus pattern: K4LIVMPHGASL3-x4GT]-x-[LIVM 
A-x(3HUVM]-x(3}-G 

[1334] j 1) Otaka E.. Hashimoto T , Mizuta K., Suzuki K Protein Seq. Data Anal 5. 301-313(1993). 
45 [1335] 500. Ribosomai protein L15e signature 

A number of eukaryotic and archaebactenal ribosomai proteins can be grouped on the basis of sequence similarities 
[1 j. One of these families consists of: 

- Mammalian L15. - insect L15. - Piant L15. - Yeast YL10 (L13) (Rp15r). 
so - Thermopiasma acidophilum L15. 

These proteins have about 200 amino acid residues As a signature pattern, a conserved region was selected located 
in the centra! section. 

ss - Consensus pattern: [DEHKR]-A.R.x-L-G-[FY3-x-[SAP3-x(2)-G-[L!VMFY]{4)-R-x.R.[!V3-x-R.G 

[ 1] Zwickl P., Lupas A., Baumeister W. 

Biochem. Biophys. Res, Commun. 209.684-688(1995). 
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[1336] 501. Ribosoma! protein LI 7 signature 

Ribosomai protein L17 is one of the proteins from the large nbosomai subunit. L17 belongs to a faintly of ribosoma! 
proteins which, on the basis, of sequence -similarities, groups - Eubactenai L17 

s - Yeast mitochondrial YmL8 (gene MRPL8). 

Eubactenai L17 is a protein of 120 to 130 ainino-aeid residues. Yeast YmL8 is twice larger (238 residues), the: sequence 
of its N-termina! half is coiinear with that ofeubacteriai L17. As a signature pattern, a conserved region intheN-terminai 
section was seiected. 

10 

- Consensus pattern: !.x-[ST3-[GT]~x(2)-[KR3-x-K-x{6)-pE^x-[L!MVHLiVrViT).T-x-(STAGHKR3 
[1337] 502 Ribosoma! protein L18e signature 

A number of eukaryotic and archaebacteria! ribosomai proteins can be grouped on the basis of sequence similarities. 
*s One of these families consists of: 

- Vertebrate L18 {known as L14 in Xenopus) [1 ] - Plant L18. 

- Yeast 118 {Rp28) - Halobacterium marismortui H129 

- Sulfolobus acidoealdarius H129e 

These proteins have 11? to 187 ammo-acid residues., A stretch of about 13 residues in the first third of these proteins 
has been seiected as a signature pattern. 

- Consensus pattern: [KR£j-x-L~x(2HPSHKRfx{2HRHHPSA]-x4LlVM].[NS3-[LiVM]-x-{RK].[LiVrVi3 

( 1j Puder M , Barnard G.P, Stantunas R.J , Steele G.D Jr., Chen I..B. 
Biochirn Biophys. Acta 1216:134-136(1993} 
[1338] 503. Ribosomai L18p family 

It has been shown that the amino terminal 93 ammo acids of Swiss.P09895 are necessary and sufficient to bind 5S 
30 rRNA in vitro The carboxyPterrnmal half of the protein, comprising amino acids 1 £1-296, serves to localize the protein 
to the nucleolus [1]. 
Number of members: 28 
[1] 

Medline: 96212235 

35 Distinct domains in nbosomai protein L5 mediate 5 S fRNA binding and nucleolar localization 
Michael WM, Dreyfuss G; 
J Biol Chern 1998,271:11571-11574. 
[1339] 504. Ribosoma! protein L19 signature 

Ribosomai protein LI 9 is one of the proteins from the large nbosomai subuntt. In Escherichia coii, LI 9 is known to be 
40 located at the 30S-50S ribosomai subunit interface and may play a role in the structure and function of the aminoacyl- 
tRNA binding site It belongs to a family of nbosomai proteins which, on the basis of sequence similarities, groups - 
Eubactenai LI 9. 

- Red algal chloropiast L 1 9 ■• Cyanelie 1. 1 9. 

45 

L19 is a protein of 120 to 130 ainino-acid residues., 

A conserved region in the C-terminai section has been selected as a signature pattern. 

- Consensus pattern; ELIVM3-x-[KRGTI]-x-[G3AI3-{KRQDA3-[VG]-[RSN]-X{0.1 HKRHSA]-EKYHKU]-ELYS3-Y-{LiM3- 

50 R 

[1340] 505. Ribosomai protein L19e signature 

A number of eukaryotic and archaebactenal ribosomai proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

ss 

Mammalian ribosoma! protein L19 [1 J. - Drosophila ribosomai protein L19 [23. 

Slime mold (D. discoideum) vegetative specific protein V14 [3]. 

Yeast nbosomai protein L19 (YL14i - Aichebactenal ribosoma! protein L19E. 
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[1341] These proteins have 148 to 203 ammo-acid residues. 

A stretch of about 20 residues in the N-termina! part of these proteins has been selected as a signature pattern. 

- Consensus pattern; Q-{KR3-R-[UVM3-x-[SA]-x(4HCV3-G-x{3H!VHWKHLfVFHDN3-P 

s 

[ 1] Chan Y.-L. Lin A... McNatty J . Peleg D, . Meyuhas O, Wool i.G. 
J. Bio!. Chem. 262:1111-1115f1987v[ 23 Hart K.. Klein T.. Wilcox M. 
Mech. Dev. 43:1 01-11 0( 1993).[ 3] Singleton C.K.. Manning S S Ken R. 

NUcleiC Audb Hv S 1/ Srb'Sr f)fc>4 t >(1&30 

tc [1342] 53f> Ribo-oin il (.toittn Lit. MgndtuK (Rit >mj!_L41 

A numbd of -hiK iryoti^ and af<"ha^Lacit.N<il noo^om i! proteins can Lo tjtoupr j on the hasi^ cf ^oquorKu similantms 
One of these families consists [1.2.3. 4] of: - Vertebrate L1 (14). - Drosophila L1. - Plant L1 . - Yeast 12 (Rp2). 

Fission yeast Lz, - Halobactenum martsmortui HmaL4 fHL6t, 
*i - Methanococcus jannaschu MJ0177, 

Ihe^e ptotein 1 - ha\e ''4t U"icha<rba t^rui t > 4" ihuin tin ammo sen* A oti^rved r^qton in thv hi tetmin.ji r. 3rt jf 
the^e orpins has be^n s^iect^o as a signature pattern 

20 - Consensus pattern: rg-x(3)-[KRM]-x(2VA-ELIVT3-x-S-A-[LIV]-x-A-[ST]-[SGA]-xf7V[RK3-[GS]-H 

[ i] Rafti R. Garctiulo G.. Manzi A.. Maiva C. Graziam F. 
Nu^ic A^id Rvs r 4<x? 4f-j< ! [ z] Pie^utti C X tila T E^ont I 
tvlujkic -ijirb R.^ 21 ^00 3O00{1 QC3t 
25 [ 3] Bagnf C. Manottinf P., Annesi F., Amaldi F. 

Biochim. Biophys. Acta 1216:475-478(1993). 

[ 33 Arndt E.. Kroetner W. Hatakeyama T. J. BioL Chem. 265:3034-3039(1990). 

[1343] 307 RibObomal piotem sty nature 
30 Ribosomal prototn Lz is one cf the- ptoteins itoin the- larje ribosomal subunit In Escherichia coll. L2 is known to bind 
If the 23S rRNA and to have peftidyitrun^fwase activity It belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities [1,v4 groups: - Eubactenai 12. 

- Algal and plant chloropiast L2. - Cyanelle L2, - Archaebaderiai L2. 

35 - Plant L2. - Slime mold L2. - Marchantia poiymorpha mitochondrial L2. 

- Paramecium tetraureiia mitochondria! 12. - Fission yeast K5, K37 and KD4 
I't-a&i i'LQ VerkbratoLS 

The bebt consewea region located in the C-termmal section of these proteins has been selected as 
40 a signature pattern. 

- Consensus pottwn P^(2VP-G-[STA!\'3(2^-N-[APK3-x-[DE] 

[ 1J Marty I.. Meyer Y, 
4S Eviuckic ^cids R.^ 20 1517-1<S22(1Q&It 

[23 Otaka £., Hashimoto T.. Mizuta K.. Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1893), 

[1344] 30b Ribosomal protein L20 signature 
so Ribosomal piotein L20 is one of the totems tfomthf iaige ribosomal subunit. In Escherichia colt. L20 is known to bind 
aire ;tl\ to the 23^ rRNA It bek ngs k .t fatnilv of ribosomal proteins which, on the basis of sequence similarities (1 j. 
groups - £uba:tenai L20 - Algal and plant chloropiast L20. 

- Cyanelle 1..2U. 

ss 

L2u is a protein of about 120 ammo-acio residues A conserved region located in the central section of these proteins 
ha*. be<rn pt4<*rt<rd at- a signature pattern 
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- Consensus pattern K-Moi-lKRCJ-MLIVMl-W-EIVJ-ESTrjALVj-R-fLh/NlHNSJ-MSl-EPKHS] 

[ 1] Otaka E.. Hashimoto T. . Mizuta K.. Suzuki K. 
Protein Seq. Data Ana!. 5:301 -313(19931 
s [1345] 509 Ribosotnal protein L21e signature 

A number of euk^rxotic .tnd atchaebaiten .tl nbcs-cmd pateim tan be ciiouped tn the bjsts. of sequence similanties 
une of these families consists of: 

Mammalian L21 - Entamoeba histolytica L21 [2], 
« - r ^i-ifu-nhabditis <=1e t ians L2I (UI4B9 ~) - Y*ast L21E (URP! ) [1] 
Halobaetenum marisrncrtui HLal [4]. 

These piotems h..ue 160 <enkr*f\otesi oi 9-j tarchehjotenai ainino-rtcid teiidttes A con^er^ <rd ipqion in the i.ential 
partotnVsn r. totems nos oeen selected us a sianutute pattern 

- Consensus pattern: G-EDE3-x-V-x(10HGV]-x(2HFYH]-x(2)-EFY]-x-G-x-T-G 

E i ] De\ i K R G Chan r ~L Wool i G 
Biochem. Biophys, Res Coininun. 162:364-370(1989!. 
20 E 2] Patter R Roz^ntlatt S Much a mo a it Y Mnelin=m O 

Mol. Biochem. Parasite!. 58:329-333(1992). 

[ 3] Jank B.. Waldherr M.. Schweyen R.J. Curr. Genet. 23:15-18(1993). 
[4] Hatakeyama I. Kimura M. Eur. J. Biochem. 172:703-711(1988). 

25 [13463 510. Ribosotnal protein 12 1 signature 

Ribos.om.ai protein t.21 is one of the proteins fiomthe large nbosomal subtintt In tscheiiohia colt L'?1 is knovwi to bind 
to the rRNA in the- p)es.en<> ofL/Q It fcelon-js to a famtt> of nto^onial protein 1 ; which on the fcasis ot sequent e 
similarities, groups: - Eubactena! L21. 

30 . Marchantia polyrnorpha chloroplast L21. - Cyanelle L21. 
Spinach chloroplast 121 (nuclear-eneodecn. 

Eubactena! L21 is a protein of about 100 ammo-acid residues, the mature form of the spinach chloroplast L21 has 200 
residues A <. onsen- ed leoion located in the C-tetmmal section of these piotetns has. been i elected a* a sicsnatuie 
35 pattern. 

■Vmsensus pattern [lVT]- t !3HKRj->tUt)-EKRQ3-K->t(6)-G-[HF]-R-ERQ3->t(2)-[ST] 

[1347] 511. Rfbosomal protein L22 signature 
40 Ribobomal protein 122 ison<= of the proteins fiomtht laiyc nLobomal buLumt In Escheii-iiia coli L22 is f>nownta bind 
2 rRNA It bekngs k a family oi iib> - >s> - >nial pr^'ins whn.h »'>n th-i basis of stquenc-i simiLinties [1 2 1] groups - 
Eubactena! L22. 

A foal ana plant chlorophtst 1.22 tjn legumes, it encoded in the nucleus instead of the chlotor. last) ■ Cyanelle 
4S L22. - Archaebaeterial L22. 

Mammalian L17. - Plant LI 7. - Veast YL17. 

A >x nerved leyuin located in the C- terminal section cf these piotems h : is be-in s-ilect-id as a sicjnaHtie pattern 

so - Consensus pattern: [RKQN3-x{4HRH]-[GAS]-x-G-EKRQS3-x(9HHDNHLIVM]-x-ELIVMS]-x-[LIVWf3 

E 1] Gantt: J.S.. Baldauf S.L.. Caiie P.J.. Weeden N.F.. Palmer J.D. 

EMBO J. 10:3073-3078(1991). [2] Madsen LH, Kreiberg J.D.. Causing K. Curr: Genet. 19:417-422(1991). 
E 3] Otaka E , Hashimoto T.. Mizuta K.. Suzuki K. 
ss Proton 3-K{ Data Anal ^ 301-3nd& f »3> 

[1348] r ->\/ RibObom^l MOtem 1 "C hirinatuftr 

Rib> - >s> - >nial piotein L23 is one of the f. ioleins tr» - >ni the largt nbosomal subumt In Es>'ru : 'tn.hia co\\ LZ1 is kncwntobind 
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a specific region on the 23S rRNA, in yeast, the corresponding protein binds to a homologous site on the 2GS rRNA 
[1]. it belongs to a family of nbosomai proteins which, on the basis of sequence similarities [2,3,4], groups: -Eubactenal 



s - Algal and plant chioroplast L23 - Archaebacteriai L23. - fvlammahan L23A 

- Caenorhabditis eiegans L23A if : 55D10 2). ■ fungi 1.25 

- Yeast mitochondria! Yml.41 {gene MRPL41 or MRP20), 

[1349] A small conserved region in the C-terminal section of these proteins, which is probably involved in rRNA-- 
to binding has been selected as a signature pattern [2j. 

- Consensus pattern: [RK]{2)-[AM]-[IVFYTHIVHRKT}-L-[STANEQK]-xf7HUVMFT3 

[ 1 ] El Baradi T.T.A L . Paue H A., van de Regt C.H.F , Verbree E C : 
*5 Planta R.J. EM BO J. 4:210-2107(1985). 

[ 2] Raue H.A.. Otaka E, Suzuki K. J. Mol. EygI. 28:418-426(1989). 
[ 3] Fearon K„ Mason T.L. J. Biol. Chem. 267:51 62-51 70{1 992). 
[43 Otaka E., Hashimoto T„ Mizuta K. 
FrotemSeq. Data Anal. 5:285--300(19&3l 

[1350] 513. Ribosoma! protein 124 signature 

Ribosomal protein L24 is one of the proteins from the large nbosomal subunit. L24 belongs to a family of nbosomal 
proteins which, on the basis of sequence similarities, groups. - Eubactenal L24 

25 - Plant chioroplast L24 (nuclear-encoded). - Red algal 124. - Vertebrate L26. 

- Yeast L26 (YL33). - Archaebacteriai HmaL24 (HL15) 

A probable ribosoma! protein from Sulfolobus aetdoealdarius [1], 

in their mature form, these proteins have 103 to 150 amino-acid residues 
30 A conserved stretch of 20 residues in their M-termmal section has been selected as a signature pattern 

- Consensus pattern; {GDEN]-0-xA/-x-[!V]-[LIVlV!A]-x-G-x(2)-[KRA]-[GNQ3-x(2,3)-[GA]--x-3iV3 

{ 1] Ouzounis C, Kyrpides N., Sander C, 
35 Nucleic Acids Res. 23:585-570(1995). 

[1351] 514. Ribosoma! protein L24e signature 

A number of eukaryotic and archaebacteriai nbosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists [1] of: 

40 - Mammalian ribosoma! protein L24. 

- Yeast ribosoma! protein L30A/B { Rp29 ) f YL21 } 
Kluyveromyc.es iactis nbosomal protein L30. 
Arabidopsis thaliana nbosomal protein 1.24 homoiog. 
Haloarcuia marismortui nbosomal protein HL21/HL22. 

45 - Methanococcus jannaschii MJ1201. 

These proteins have 60 to 160 amino-acid residues. The most conserved region, which is located in the N-terminal 
region of these proteins has been selected as a signature pattern 

so - Consensus pattern: [FY]-x-[GSH]-x(2)-[IV]-x-P-G-x-G-xf2HFYV]-x-[KRHE3-x-D 

[ 1] Chan Y.-L, Olvera J., Woo! l.G. Biochem, Biophys. Res. Comrnun. 202:1176-1180(1994), 
[13523 515. Ribosoma! protein 127 signature 

Ribosoma! protein 1.27 is one ot the proteins from the iarge ribosoma! subunit. t.27 beiongs to a family of nbosomal 
55 proteins which, on the basis of sequence similarities [1,23, groups: - Eubactenal L27. 

Plant chioroplast L27 (nuclear-encoded). - Aigal chioroplast L27. 

- Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 



197 



EP 1 033 405 A2 



The schematic relationship between these groups of proteins ts shown below. Eub. L27 NxxxxxxxxxAlgai L27 
Nxxxxxxxxx 

Plant L27 tttttNxxxxxxxxxxxxx 

Yeast MRP 7 tttNxxxxxxxxxxxxxxxxxyxxxxxxxyxxxx>ix:<>->< 
***'t'; transit peptide. 

'N': N-terminal of mature protein .'*'; position of the pattern. 

- Consensus pattern: G-x-[L!V'Mj(2)-x-R-Q-R-G-x(5)-G 

[ 1] Elhag G A . Boutque D P Biochemistry 31 6856-8864(1992} 
[ 2] Otaka E., Hashimoto T., Mizuta \< 
Protein Seq. Data Anal 5:285-300(1993). 

[1353] 518. Ribosoma! L.28 family 

The ribosoma! 28 family includes 128 proteins from bacteria and chloroplasts. The L24 protein from yeast Swiss: 
P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is atso found in the large ribos- 
oma! subunit 
Number of members: 24 
[1354] 517 Ribosoma! protein 1.29 signature 

Ribosomal protein L29 is one of the proteins from the large ribosornai subunrt L29 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups - Eubacterial L29. - Red algal L29 

- Archaebactenal L29. - Mammalian L35 - Caenorhabditts elegans L35 (ZK652.4). 

- Yeast L35. 

L29 is a protein of 63 to 138 ammo-acid residues. 

A conserved region located in the central section of L.29 has been selected a* a signature pattern. 

- Consensus pattern; [KNQS}-[PSTL]-x{2)-[LIMFA)-[KRGSAN]-x-[LlVYSTA]-[KR]-[KRHQS]-[DESTANRLHL!V>A- 
[KRCQVTHLIVMA] 

[ 1) Otaka E.. Hashimoto T„ Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993}. 
[1355] 518 Ribosomal protein 1.3 signature 

Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coii, L3 is known to bind 
to the 23S rRNA and may participate in the formation of the peptidyitransferase center of the nbosome It belongs to 
a family of nbosornal proteins which, on the basis of sequence- similarities [1,2,3,4], groups: - Eubacterial L3 - Red 
algal L3. - Cyanei!e L3. 

Archaebacterial Halobactenum mansmortui HmaL3 (HL1 ). 

Yeast L3 (also known as trichodsrmin resistance protein} (gene TCM1). 

Arabidopsis thaiiana L3 {genes ARP1 and ARP2). - Mammalian L3 (L4). 

Mammalian mitochondrial L3, - Yeast mitochondria! YinL9(gene MRPL9} A conserved region located in the centra! 
section of these proteins has been selected as a signature pattern 

- Consensus pattern: [FL]-x{6HDN3-x{2HAGS3-x-(ST]-x.6-[KRH>G-x{2)-G-x{3)-R 

[ 1j ArndtE, KroemerW.. Hatakeyama T. J. Biol Chem. 26S?X)34--3039(1S90i. 

[ 2j Graaek H -R . Grohrnann L , Kttakawa M . Schaefei K L, Kruft V. 

Eur. J. Biochem. 208:373-380(1992). 

[ 3] Herwig S., Kruft V., Wittmann-Liebold B. 

Eur. J. Biochem. 207:877-885(1992). 

[ 43 Otaka E., Hashimoto T., Mizuta K.. Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993), 

[1356] 519 Ribosomal protein L30 signature 

Ribosomal protein L30 is one of the proteins from the large ribosomal subunit. L30 belongs to a family of ribosoma! 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L30. - Archaebacterial L30, 
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Drosophiia L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (VIS). 
Yeast mitochondrial L33. 

L30 from eubacterta are small proteins of about 60 residues, those from archae bacteria are proteins of about 150 
s residues Eukaryotic L? are- proteins ofabout250to270 residues The schematic relationship between the three groups 
of proteins is shown below.Eub. L30 NxxxxxxxxxxC 
Arc. 130 NxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

Euk L7 NxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxC *******''"' position of the pattern. 
The signature pattern for this family of ribosomai proteins spans the N-termmai half of the region common to all these 
10 proteins. 

- Consensus pattern: [lVTHLIVM]-x{2HLF3-x-[L!3-x4KRHQEG3-x(2)-[STNQH}-x--[iVT3--x{10V[LMS]-[L!V3-x(2>[Ll. 
VA3-x{2HLMFY]-[lVT] 

»5 [ 1] Mizuta K., Hashimoto I, Otaka E. 
Nucleic Acids Res. 20:1011-1016(1992). 
[1 357] 520. Ribosomai protein 1..31 signature 

Ribosomai protein L31 is one of the proteins from the iarge ribosoma! subunit L31 is a protein of 86 to 97 amino-acid 
residues which has only been found so far in eubacterta and in some algal rhloroplasts. 
so A conserved region located in the- central section of these proteins has been selected as a signature pattern 

- Consensus pattern- H-P-F-[FYHTf]-x(9)-G-R-[AIV]-x-[KRQ] 

[1358] 521. Ribosoma! protein L31e signature 
25 A number of eukaryotic and archaebacterial ribosoma! proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

- Mammalian 131 [1]. - Chlamydorrtonas reinhardtii 131. - Yeast 134. 
Halobacterium tnarismortui HL30 [23. 

These proteins have 87 to 128 ammo-acid residues. 

A conserved region, located in the central section has been selected as a signature pattern. 

- Consensus pattern: V4KR]-[LIVM3-.x(3)-[LlVM]"N"X-.[AKH]-.x-.W-.x-{KR]-G 

j 1] Tanaka T.. Kuwano Y. Kuzumakt T. ishikawa K., Ogata K Cur j. Biochein. 1 62:45-48(1 S87)-[ 23 OSergmann U.. 
Arndt: E. 

Biochim. Biophys. Acta 1050:56-60(1990). 
[1359] 522 Ribosomai protein L33 signature 
40 Ribosomai protein L33 is one of the proteins from the large ribosomai subunit In Escherichia cott. L33 has been shown 
to be on the surface of 503 subunit L33 belongs to a family of ribosomai proteins which, on the basis of sequence 
similarities [1.2.3], groups: - Eubactenal L33. 

Algal and plant chloroplast L.33. ■■ Cyanelle 1..33. 

L33 is a small protein of 49 to 66 amino-acid residues A conserved region located in the central section of L33 has 
been selected as a signature pattern, 

- Consensus pattern: Y-x-[STj-x-[KRHNS]-x(4HPATQ]-x(1 ,2HUVMH£Aj-x{2)-K-[FYHCSD) 

[ Ij Kruft V., Kapp U, Wittmann-Lieboid B. Biochimie 73:855-860(1991 ). 
[23 Sharp P.M. Gene 139:129-130(1994). 
[ 33 Otaka £., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1893), 

ss 

[1360] 52-3. Ribosoma! protein L34 signature 

[1381] Ribosomai piotein t 34 is on* 1 of the protf ins from the !arg*> subunit ot the piokaryotn. uboioine It is a srm!! 
basic piolein of 44 k S1 ammo-acid fesidu^s [1] L34 b^ionqs to a family cf ribosomai pi ofeins which cn the basis of 
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sequence similarities, groups' - Eubacterial L34. 
Red algal chloroplast L34 - Cyanelle L34 
s A conserved region that corresponds to the N-terminal half of L34 has been selected as a signature pattern. 

- Consensus pattern: K-j;RG3-T.[FYWLHEQS]-x(5HKRHS3-x{4 l 5)-G-F-x(2}-R 

j 1] Old I.G.. Margarita D.. Saint Girons I 
10 Nucleic Acids Ras 20:6097-6097(1992). 

[1362] 524. Ribosomai protein L34e signature 

A number of eukatyotic and archaebacterial ribosomai proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

»5 - Mammalian 134. - Mosquito 131 [1], - Plant 134 [2]. 

Yeast putative ribosomai protein Y!L052c. - Methane-coccus jannaschii MJ0655. These proteins have 89 to 129 
ammo-acid residues, 

A conserved region located in the N-terminal section of these proteins has been selected as a signature pattern. 

20 

- Consensus pattern: Y-x-[ST]-x-S-[NY3-x{5)-(KR]-T-P-G 

[1]LanQ., Niu LL, Fallon A.M. 
Biochim. Biophys. Acta 1218:460-482(1994). 
25 [2] Gao J„ Kim S.R„ Chung Y.Y., Lee J.M., An G. 

Plant Mot. Biol. 25:761 -770(1 994). 

[1363] 525 Ribosomai protein L35Ae signature 

A number of eukaryotic and archaebacterial ribosomai proteins can be grouped on the basts of sequence similarities. 
30 One of these families consists of 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 

■ Yeast L37A/L37B ( Rp47 ). - Pyrococcus woesei L35A homolog [1 ]. 

35 These proteins have 87 to 110 amino-acid residues. 

A highly conserved stretch of 22 residues in the C-terminai part of these proteins has been selected as a signature 
pattern. 

- Consensus pattern- G-K-[LIVM3-x-R-x-H-G-x(2}-G-x^~x~A-y.-F-x(3H!-l]-P 

[ 1] Ouzounis C, Kyrpides N., Sander C. 
Nucleic Acids Res. 23:585-570(1995). 
[1364] 626. Ribosomai protein L.36 signature 

Ribosomai protein L.36 is the smallest piotein from the large subunitof the prokaryotic nbosome It belongs to a family 
■*s of ribosomai proteins which, on the basis of sequence similarities [1] groups - Eubacterisl L.36 - Algal and plant 
chioroplastL36 -CyanelleL38 L36isasmall basic and cysteme-nch protein of 37 amino-acid residues As a signature 
pattern, a conserved region that corresponds to positions 11 to 36 in 1.36 and includes three conserved cysteine residues 
has been developed. 

Consensus pattern: C-x(2)-C-x(2}-[LIVMj-x-R-x(3)-[LIVMN]-x-[LIVMj-x-C-xt3.4}-EKR]-H-x-Q-x-Q- 
50 [ 1] Giaka E.. Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1365] 52'?'. Ribosomai protein l..36e signature 

A number of eukaryotic ribosomai proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian 136 (1). 

ss - Drosophila L36 (M(1)1B) - Caenorhabditis elegans L36 (F37C12.4). 

- Candida albicans L39 - Yeast YL39 

These proteins have 99 to 104 amino acids. 
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A conserved tegion in the centra! part of tnese proteins has been selected 3b a signature pattern, 

- Consensus pattern: P-Y-E-[KR]-R-x-EL!VM3-[DE]-[LIVM]{2V[KR] 

s [ 1) Chan Y L Paz V Olvora J Wool I G 

Bioihem Btoph\s Res Coinmun 1 S# 845» SE> 3i 1 3 > 
[1366] 528 Pibosomal protein L^3e signature 

A number of eukaryotic and archaebactenal ribosomai proteins can be grouped on the basis of sequence similarities. 
One of these families consists at 

10 

Mammalian 139 [1]. - Plants 139. - Yeast L46 [2]. - Archebactena! U>9e [3]. These proteins are very basic. About 
£"0 residues long they at^ the smallest totems of eukan><.ti<.-tvpe nboaoirteb ^conserved region in the C-termina! 
section of these proteins: hat, been selectee as a signature pattern 

- Consensus pottwn j> PA]-T-> l 3)-(LIVM]-EKROFj-x-[NHS3-<.(5)-R-[NH , >']-W-R-R 

i5 

E 1] Lin A.. McNailv J.. Woo! l.G. J. Bio!. Chem. 259:487-490(1984}. 
EI]LeerRJ >Mn Rrtainsdt nk-Dum M M C !\r.takman P MageiU'H 
PiantaRJ Nucleic Acids Pes U "01-709( J980 
[ i] Rami.*? C Louie K A Matheson ^ T FEBo Lett :f-U 416 4ibt N8i»l 

[1367] 529 Ribosomai L40e family 

Bovine L40 hab been identified as a secondary PNA binding protein [1] L40 is fused to a ubiquitin protein [2]. 
Number of members: 27 
[1] 

25 Mediine: 88203200 

RNA binding proteins of the large subunit of bovine mitochondria! nbosomes, 
PiaWek MA, Dens-io* ND, O'Bnen VA 
Nucleic Acids R^s 1W 1o2^C5-2Seo 

E-^Medime: 9601183,=; The carboxy! extensions of two rat ubiquitin fusion proteins are ribosomai proteins S27a and 
30 L40. 

Chan YL Suzuki K. Woo! IG: 
Biochem Biophys Res Commun 1v*05 21 3 6b2^?>0 
[1368] 530 tRtbOborm! L44> Ritxsunal protein L44e sinn=rtuie 

A number of enkarxotir .tnd atchaebaiten .ti nbosomd pateim tan be giouped tn the basis of sequence similarities, 
35 une of these families consists of: 

Mammalian L44 E1 j. - Trypanosoma brucet L44. 

- Caenorhabditts eiegans L44 (CC9H10.21 - Fungai 144 (L41 ). 
Haiobactetium rnaiismortui LA [2] 

Tru^f* ( fofi : 'ins haw 02 to 10^ amine -acid r^ioue S 

A mnserved tegion located in the C-tetmmal part of th^s^ ptoteins nos oeen selected as a signature pattern. 

- Ctmensus pattern K-> -|TVj-K-K <(2) L [bR] <u?)-C 

45 

[ 1 j Gailagner M J Chan ^ -L Lin A , Woo! I G DNA 7 2G°-27 0 i 193e) 
[2] Bergmann U., W'ittmann-Liebold B. 
Bio^-him Biophys Act=i H v 3 1^5-200{ 1^03 

so [1369] "53! RibObom^l MOtem L5 sign^tuie 

Ribosomai protein L5 is one of the proteins from the large ribosomai subunit. In Escherichia coii, L5 is known to be 
involved in binding 5S RNA to the farge ribosomai subunit. It belongs to a family of ribosomai proteins which, on the 
basis of sequence simiianties [1.vi.3.4). groups: - Eubactenai 15. 

ss - Aigai fhlotopiast LS - Cyandie L5 - Arohatbact.^ia! L'S - Mammalian L11 
Tetrahymena thermophiia L^l. - Siime moid L5 (VIS). - Yeast LIS (39A), 
Plants mitochondria! L5. 
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L5 is a protein of about 180 ammo-acid residues. 

A conserved region, iocated in the first third of these proteins has been selected as a signature pattern. 

- Consensus pattern; [LIVM]~x(2HLIVMHSTAVCHG^^^ 

s 

[ 1 ] Hatakeyarrta T . Hats key a ma T. Biochim Biophys. Acta 1039:343"347(1990). 

[ 2j Rosendahi G., Andreasen PH . Kristiansen K Gene 96: 161-1 67i 1991) 

[3] Yang D., Gunther l.« Matheson AT., Auer J., SpickerG.. BoeckA. Biochtmte 73:679-682(1991). 

[4] Otaka E, Hashimoto T, Mirruta K . Suzuki K Protein Seq. Data Ana! 5'301-313(1S93). 

10 

[1370] 532. ribosomai L5P family C4errmnus 

[1371] This region ts found associated with Ribosomai_L5 Number of members: 60 
[1372] 533 Ribosoma! protein L6 signatures 

[1373] Ribosomai protein L6 is one of the proteins from the large nbosomal subunrt. in Escherichia colt. L6 is known 
»5 to bind directly to the 23S rRNA and is iocated at the aminoacyl-tRNA binding site of the peptidyltransferase center, it 
belongs to a family of ribosomai proteins which, on the basis of sequence similarities [1.2.3.4], groups: - Eubacteria! IS. 

Aigai chloropiast 18. 

- Cyanelle I..6 

so - Archaebacterial L6. 

Marchantia polymorpha mitochondrial 18. 
Yeast mitochondrial YrnlS (gene MRPL6). 
Mammalian L9. 

- Drosophiia L9. 
25 - Plants L9. 

- Yeast 1.9 (Yl.11 ). 

[1374] While ail the above proteins are evolutionary related it is very difficult lo derive a pattern that will find thern 
all. Two patterns were therefore created, the first to detect eubactertai. cyanelle and mitochondria! 16, the second to 
30 detect archaebacterial L6 as well as eukaryotic L9. 

- Consensus pattern: {PSHOENSJ-x-Y-K-JGAj-K-G-fLIVM] 

- Consensus pattern: Q-x(3)-[L!VM3-x{2HKRhx(2)-R-x-F--x--D--G.[LlVM]-Y-[LIVM3.x(2HKR] 

35 [1] Suzuki K., Oivera J., Woo! i.G. Gene 93:297-300(1990). 

[2] Schwank S., Harrer R , Schueiler H.-J., Schweizer E Curt Genet. 24:138-140(1993). 

[3] Golden B L . Rarnakrishnan V., White S W EM BO J. 12:4901^908(1993). 

[4] Otaka £.. Hashimoto I. Mizuta K„ Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

40 [1375] 534. Ribosoma! protein L6e signature 

A number of eukaiyotic and archaebacterial ribosomai proteins can be grouped on the basts of sequence similarities 
One of these families consists of: 

Mammalian ribosomai protein LS (LS was previously known as TAX-responsive enhancer element binding protein 
4S 107). 

Caenorhabditis elegans ribosomai protein L8 (R151.3). 

- Yeast ribosomai protein YL.ieWYi.l6B. 
Mesembryanthernum crystailinurn ribosomai protein YL16-like. 

so These proteins have 175 (yeast) to 287 (mammalian) amino acids. A highly conserved region in the central part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x(2hP-L-R-R-x(4HFY]-V-l-A-T-S-x-K 

ss [1376] 535 Ribosomai protein L7Ae signature 

A number of eukaryotic and archaebacterial nbosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of 
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- Vertebrate L7A {SURF3} [1]. - Plant L7A. - Yeast L7A (YL5) (Rp8). 

- Yeast protein NHP2 [2], - Yeast hypothetical protein YEL026w. 

Bacillus subttlis hypothetical piotein ylxQ - Halobactenum marismortui Hs8 
Methanococcus jannaschii MJ1203. 

s 

[1377] 'These proteins have 100 to 265 amino-acid residues. 

A conserved region located in the: central section has been selected as a signature: pattern. 

- Consensus pattern: [CA]-x(4)"[IV]-P-[FY3-x{2)-(LiVM]"X-[GSQHKRQ]-x(2)"L-G 

10 

{ 1] Colombo P ; Yon J.. Garson K.. Fried M Proc Natl. Acad Sci U S.A 89 6358-6362(1992). 
[2j Kolodrubetz D., Burgum A. Yeast 7:79-90(1991). 

[1378] 536 Ribosomai protein L9 signature 
*s Ribosomai protein 19 is one of the proteins from the large ribosomal subunit In Escherichia coli. 19 is known to bind 
directly to the 23S rRNA. It belongs to a family of ribosomai proteins which, on the basis of sequence similarities [1 ,2], 
groups: ■■ Abacterial L.9 ■• Cyanohaoterial L9, 

Plant chloroplast I.9 (nuclear-encoded). ■ Red algal chloroplast L.9 

A conserved region, located in the N-termmal section of these proteins has been selected as a signature pattern 

- Consensus pattern: G-x{2HGN]-x(4)-V-x{2)-G-[FY}-x(2)-N-[FY3--L-x{5HGA]-x{3HSTN3 

25 [ 13 Hoffman DM, Davies C.« Gerchman S.E., Kycia J.H., Porter S.J., White S.W., Ramakrishnan V. EMBO J. 13: 

205-212(1994) 

[ 23 Olaka E . Hashimoto T.. IVtotta K., Suzuki K. Piolein Seq Data Anal. 5:301-313(1993) 

[1379] 537. Ribosomai protein S1Q signature 
30 Ribosomai protein S10 is one of the proteins from the- small ribosomai subunit In Escherichia colt, S10 is known to be 
involved in binding tRNAto the nbosomes. It belongs to a family of ribosomai proteins which, on the basis of sequence 
similarities [1], groups: - Eubacteriai 31 0. 

Algal chloroplast S10. ■ Cyanelle 310. ■■ Archaebactenal S10. 
35 - Marchantia polymorphs and Piotothec.a vvickeihamn mitochondrial S10. 

Arabidopsis thaliana mitochondrial S10 (nuclear encoded), - Vertebrate S20. 

- Plant S20. - Yeast URP2. 

S10 is a protein of about 100 amino-acid residues. 
40 [1380] A conserved region located in the center of these proteins has been selected as a signature pattern. 

- Consensus pattern: [AV3-x(3HGDNSRHL!VMSTA3-x(3}-G-P-[L!VM]-x-[LIVM3-P-T 

j 1] Otaka F.. Hashimoto '!'.. Mizuta K. 
4S Protein Seq. Data Anal 5 265-300(1993). 

[1331] 538 Ribosomai protein S11 signature 

Ribosomai protein S11 |1] plays an essential roie in selecting the correct tRHA in protein biosynthesis. It is located on 
the large lobe of the- small ribosomai subunit S11 belongs to a family of ribosomai proteins which, on the basis of 
sequence similarities, groups [23. - Eubacteriai S11 

Algal ana plant chloroplast S11. ■ Cyanelle 311 ■• Archaebacteriai S11 
Marchantia polymorpha and Prototheca wickerhamii mitochondrial S11 . 

Acanthamoeba castellanii mitochondrial 311 - Neurospora crassa 314 (crp-2j. - Yeast $14 (RP59 or CRY1) 
Mammalian. Drosophiia, Trypanosoma, and plant S14, 
ss - Caenorhabditisek:gansS14(F37C12.9t 

One of the best conserved regions in these proteins was selected as a signature pattern. 
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- Consensus pattern [UVMF]-v-fGSTAC3-[Liv'MF3-n24GSTAL}~YfO 1HGSN]-[LIVWF]- A -[L!VM)-M4>-[DENH-T-P- 
x-[PAHSTCH]-[DN] 

[ 1] Kirnura M.. Kimura J,. Hatakeyama T. FEBS Lett. 240:15-20(1988). 
s [ 2] Otaka £., Hashimoto T.. Mizuta K. 

Protein Seq D.jki *\nal b Mb 300t 1P93s 

[1382] 53a Ribosoma! protein 312 signature 

Ribosoma! protein Si 2 is one of the proteins from the small ribosoma! subumt In Escherichia coh. S12 is known to be 
to involved in the trjr^Lil ion initiation step 11 is a very basro protein ot 120 to 150 amine - : ioid residues b 12 belongs to 
a family of rthosoinai ptuisjins -."huh on the basis of s>K{uenct; similarities [1] cuoups - Euhactena! '312 - -itch leb ic- 
terial S12. 

Algal and plant cnioroplast S12. - Cyaneile S12. 
*s - Protozoa and plant mitochondrial S12. - Yeast S28. 

Drosophtia mitochondrial protein tko (Technical KnockOuti. - Mammalian S23. The best conserved regions in these 
proteins, located in the center of each sequence ha^ e been selected as .j signature pattern 

- Consensus pattern [Rk]-*-P-N-S~[^Rj„ Y ..R 

20 [ 1 ] Ota ka E . . Hash imoto T. , Mizuta K . 

Protein Seq Dahi Anal S 285-300(1993; 
[1383] 54u. Ribosoma! protein 312e signature 

A numbei ot eukaryotic nbosoinal protein*. b<= naupfd on the b^sis of stqufni,* 1 bimilanti^ One attest t^imlifs 
consists of: - Vertebrate S12 [1]. 

]'i>pano?oma biuret S1<! [?.[ ■ Oaenorhabditrs elegans S12 (i'-Mhi 2) 

- Drosophtia SI 2. - Yeast SI 2. 

These proteins have 130 to 150 amino acids. 
30 A conserved legion in the N-termina! part of these proteins has been -selected as a signature pattern. 

- Consensus pattern: A-U(KRQP3-x-V-L-x(2HSA3-x(3HON]-G-L 

[ 1] Lin A., Chan Y.-L, Jones R., Woo! l.G. 
35 j. Bioi. Chem. 262: 14343-1 4351 (1987). [ 2] Marchal C. t Ismail! N., PaysE. Mo). Biochem. Parasitoi. 57:331-334(1993). 
[1384] 541. Ribosoma! protein S13 signature 

Ribosoma! protein S1 3 is one of the proteins from the small ribosoma! subunit in Escherichia coil, S13 is known to be 
involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-aeid 
residues and beiongs to a family of ribosoma! proteins which, on the basis of sequence similarities [1.2] t groups - 
40 Eubacteria! S13. 

Plant chloropiast 813 (nuclear encoded). - Red algal chloroplast 81 3 

- Cyaneile S13. ■ Archaebacterial S13 ■■ Plant mitochondrial S13. 
Mammalian and plant S18. 

45 

The best conserved regions in these proteins, located in their C-termmal part have been seiected as a signature pattern 

- Consensus pattern: [KRQSj-G-<-R-H-<(2)-[GSNH]-x(2!-[LIV'MC3-R-G-Q 

so [1] Chan Y.-L. Paz V.. Wool l.G. 

Biochem, Biophys. Res. Commun. 178:1212-1218(1991). 
[ 23 Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993), 

ss [1385] 542 Ribosoma! protein S14p/S29e (Ribosoma! protein S14 signature) 

[1386] Ribosomal protein S14 is one of the proteins from the small ribosoma! subunit. in Escherichia coii, S14 is 
known to be required for the assembly of 30S particles and may also be responsibie for determining the conformation 
of 16S rRNA at the- A site It beiongs to a family of ribosomal proteins which, on the basis of sequence similarities [1.2]. 
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groups; 

- E abacterial 5514. 

Afgat and plant chloroplast S14. 
s - Cyanelle 314. 

Archaebactenal Meihanococctjs vannielii S14. 

- Plant mitocboncfrial $14. 
Yeast mitochondnal MRP2. 
Mammalian S29. 

10 - Yeast YS29A/B. 

[1387] S14 is a piotein of 5o to 115 amino-^cid lesidu*^ Oui sign^tuie pattern is ba*.ed on the few conserved 
positions located in the center of these proteins. 

[1388] Consensus pattern [PP]-^0 1 S-C-vf 11 12HUVMF1-Y4JVMF]f3Gj-[KG3-\f3HRN3 

[13 Chan ^ -L Sieuki K Given I Wool I G Nucleic Acids R** 21 64^-666(10^31 
[?3 0taka[- Hashimoto T Mf.-rut.t h Ptotein c>q Data Anal 6- 1993} 

[13S9] Pibosomal protein S15 ^nature 
J- Ribosomal (.toitiin is oru :i of the pioteins from the small nbosomai ^utunil In E&i'ruwichki coll this protein binds 
to 16S ribosom.il PNA am functions at eorlv steps in nfc •jsome assemoly It bebngs to a family of ribosomal proteins 
which on the basis of seouence similarities [1 2] groups - Eubactetial S1C 

- Archaebactenal Halcbactenum mansmortui HmaS15 fHS11 ). 

25 - Plant chloropiast $15 - Veast mitochondnal 328 - Mammalian 31 3. 

Brucjta pahangi and WuHierena banaofti bl 3 fS fji ■ /east SI ^ (\ Slit 

S1 " a proton of ftO to 250 ammo-add f<= sidims 

A conserved tegion located in the C-tetminai part of these proteins nas oeen selected as a signature pattern. 

- Co iberr^ pattern [L!VMHi2VH-[LIVMFV3-^f )-D-<i2HCAGN3-k(3HLF}-< i 9V[LIVM3-x{2HF>'] 

[1j Dang H.. Ellis S.R. 
Nik leic Ac ids, Res 1 8 oc^5 6901 (1 S*-0 i 
^ [2|Ota(.aE Hashimoto T Mi^ntaK 

Piotein Seo Data Anal 6 3c5-3no(1943) 

[1390] 544 Ribosomal r. totem S 16 sign^tute 

[1391] RiDOSornai ptotein S16 is one of the ptote ins from the small iibosomal snbumt It belongs to a family of ribos- 
■fo oinal patens which on the b=tsib of beqtten^ similarities [1] groups 

- E abacterial S16. 

Algal and plant chloropiast S16, 

- Cvaneiie S16. 

■*5 - EvifUfosporj erassa mitochondrial P24 i^,t-21) 

[1392] SK is a protein ot about 100 ammo-acid le^-idues A conserved lection located in the N terminal extremity of 
these proteins has been selected as a signature pattern. 
[1393] Consensus pattern [LIVMTj-v-[Li\ M3-[KR]-L-£STAK3-K-^G-(Ah.R] 
j<J [1394] [IJOtakaE Hashimoto T Mcute k Pi oWin Seq Data Anal 5 285-300^19^3 1 
[1395] 54^ Ribosomal r totem St ? stynatute 

Ribosomal protein SI 7 is one of the proteins from the small nbosoma! subunit. In Escherichia cod, S17 is known to 
bind specifically to the 5'end of 16$ ribosomal RNA and is thought to be involved in the recognition of termination 
codons. It belongs to a family of ribosomal proteins which, on the basis ot sequence similarities [1,2,3], groups; - 
ss Eubactetia! S1 7. 

- Plant chlof.-plabtsrtfHrl.raf encoded) - P<rd alcnl <.hk lopla^t sr 

f van-ille S17 - AnJift^baotenai S 17 - Mammalian arid ( lant cytoplasmic ^11 
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- Yeast S18a and S18b (RP41; YS12). 

The best conserved regions located in the C-termmal sections of these proteins have been selected as a signature 
pattern. 

s 

- Consensus pattern. G-D->-jL!V]-K-[LIVAH-(Ot;.K}-<-(RK3--P-[LIV}-3 

[ 1] Gantt J.S., Thompson M.D. J. Biol. Cbem. 255:2763-2757(1990). 
[ 2) Herfurth E , Hirano H., Wittmann-Liebold B 
10 Biol Chem Hoppe-Seyler 372-955-961(1991) 

[ 3] Otaka £., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal 5:285-300(1993). 

[1396] 548. Ribosoma! protein S17e signature 
*5 A number of eukaryotic and archaebactena! ribosornal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

Vertebrates S17 [1] - Drosophiia S17 [2] - Neuiospora crassa S17 fcrp-3i 

- Yeast S17s (RP51A) and S17b (RP51B) [3], - Methanococcus jannaschii MJ0245. These proteins have from 63 
so ijn arch ebad ena) to 1 30 to 146 amino acids and are highly conserved A region in the central part of those proteins 

has been selected as a signature. 

- Consensus pattern A-x-l-x-[ST]-K-x-L-R-Ni-[KR3-l-A-G-[Fv3-x-T-H 

[ 1] Chen i.-T.. Roufa D.J. Gene 70:107-118(1988). 
25 [2]MakiC. ; Rhoads O.D., Stewart M J„ van Siyke B„ Dene!! R.E.. 

Roufa D.J Gene 79 289-2980989) j 3j Abovich N . Rosbash M 
Mo!. Cell. Biol 4:1871-1879(1984) 

[1397] 547 Ribosornal piotein S18 signature 

30 Ribosornal protein S18 is one of the- proteins fiom the small ribosornal subunii In Escherichia coll. S18 has been 
involved in ammoacyl-tRNA binding} 1 j It appears to be situated at the tRNA A-stte ot the nbosome It belongs to a 
family of ribosornal proteins which, on the basis of sequence similarities^), groups: - Eubaetena! S18. - Algal and plant 
chloroplast S18 - Cyanelle S 18 As a signature pattern, a conserved region in the central section of the protein has 
been selected This legion contains two basic residues which may be involved in RNA-hmding ■• 

35 Consensus pattern: [iVHOY)-Y~x(2HLIVMT1~x(2HLIVM}-x(2HFYTHL!ViVi]- [STHDERP3-x-[GY]-K-[L!VM]~x(3)-R- 
[LIVMAS]- 

[ 1] McDougall J., Choli T., Kruft V., Kapp SI, Wittmann-Liebold B. FEBS Lett. 245:253-260(1 989).[ 2] Otaka E., Hash- 
imoto T . Mizuta K Protein Seq Data Anal 5 285-300(1993} 
[1398] S48 Ribosoma! protein S19 signature 
40 Ribosornal protein S 19 is one of the proteins from the small ribosornal subumt In Escherichia colt. S19 is known to 
form a complex with S13 that binds strongly to 16S ribosoma! RNA S19 belongs to a family of ribosoma! proteins 
which on the basis of sequence similarities [ f 2], groups - Eubactena! S19 

Algal and plant chloroplast SI 9 ■• Cvanelle S19. ■ Archaehacteria! S19 
45 - Plant mitochondria! S1 9 - Eukaryotic. S15 frig' protein) 

S19 is a protein of 88 to 144 amino -acid residues Our signature pattern is based on the few conserved positions 
located in the C-termina! section ot these proteins. 

so - Consensus pattern: [STONQ]-G-[KRQMhx(6HLIVM]-x(4!-[LiVMHGSD}-x(2H!-F]-[GAS3-[DE3-F-x(2)-[ST] 

[ 1] hitagawa M , Takasawa S , Kikuchi N . itoh I. Teraoka H.. ^amamoto H , Okamoto H. FEBS Left 283.210-214 
(1991). 

[2] Otaka E , Hashimoto T„ Mizuta K. 
ss Protein Seq Data Anjl 5-285-300(1993? 

[1399] 549 Ribosoma! protein St9e signature 

A number of eukaryotic and archaebactenal ribosornal proteins can be grouped on the basis of sequence similarities 
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[1.2] One of these families consists of - Mammalian Si 9. - Drosophila Si 9. 

- Ascaris fumbricoides S19g (ALEP-1) and S19s. - Ysast YS18 (RP55A and RP55B). 
Aspergillus S16. - Halobacteriunrt mansmortm HS12. 

s 

These proteins have 143 to 155 amino acids. 

A well conserved stretch of 20 residues in the C-terminai part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-x(6)~[SAN]~x(2HLiVMA]-x~R~x~[ALIVHLV]-Q-x-L-[EQ] 

10 

[ 1] Etter A., Aboutanos M., Tobier H., Mueller F. 

Proc. Nat!. Acad. Sci. U.S.A. 88:1593-1596(1991). 

[2j Suzuki K., Olvera J, Wool l.G. Biochirnie 72:299-302(1990). 

»5 £14003 550. Ribosomai protein S2 signatures 

Ribosotnal protein S2 is one of the proteins from the small ribosomai subuntt. S2 belongs to a family of hfoosomal 
proteins which, on the basis of sequence similarities [1.2j. groups. ■ Euhacteria! 82 ■■ Algal and plant ohioropiast S2. 

- Cyanelle S2. - Archaebactenal S2. 

so - Higher eukaryotes P40 i previously thought to be a laminin receptor) 
Yeast NAB1, - Plant mrtochondrial S2. - Yeast mitochondrial MRP4. 

S2 is a protein of 235 to 394 ammo-acid residues. 

Two conserved regions have been selected as signature patterns. One: is located in the !\!-terminal section and the 
25 other in the central section. 

- Consensus pattern: [L!VMFA]-x(2)-[L!VMFYCj{2)-x-[STAC3-[GSTANQEKRHSTALV]- 

[HY3-|LIVMF]~G 

- Consensus pattern: P~x(2HLIVMFK2HLIVMS3-x-[GDN3~x(3HDENL3-x{3HLIVM]-x-E-x(4)-[GNQKRHHLiVM]- 
30 [AP\ 

[ 13 Davis S.C., Tzagoloff A., Ellis S.R. 
J. Biol. Chem. 267:5508-5514(1992). 

[2j Tohgo A : Takas.awa S., Munakata H : Yonekura H., Hayashi N., Okamoto H. FE-BS Lett. 34!}-133-13{J(1Sj94). 

35 

[1401] 551. Ribosomai protein S21 signature 

[1402] Ribosomai pfotein S21 is one of the proteins from the srnaii ribosomai subunit So far S21 has only been 
found in eubactena. It is a protein of 55 to 70 amino-aeid residues. A conserved region in the N-terminai section of the 
protein has been selected as a signature pattern. 
40 [1403] Consensus pattern [DE]-x-A.-[LIY]-[KRj-R-F-K-[KR]-x(3}-EKR3 
[1404] 552 Ribosotnal protein S21e signature 

A number of eukaryotic ribosomai proteins can be grouped on the basis of sequence similarities One of these families 
consists of: - Mammalian S21 [1], 

45 - Caenorhabditis eiegans S21 (F37C12.11). - Rice S21 [2], 

- Yeast S21 (Ys25j [3] - Fission yeast S28 [4] 

These proteins have 82 to 87 ammo acids. 

A perfectly conserved nonapeptide in the N-termina! part of these proteins has been selected as a signature pattern 

- Consensus pattern. I..- Y-V- P -R ■K--C--S--[SA] 

[ 13 Bhat K.S., Morrison S.G. Nucleic Acids Res. 21:2939-2939(1993). 
[23 Nishi R„ Hashimoto H., Uchimiya H., Kato A 
ss Biochirn. Biophys Acta 12 16' 11 3-1 14(199.3} [ 33 Suzuki K., Otaka E. Nucleic Acids Res. 16:6223-622.3(1968) [ 4] 

itoh T.. Okata E„ Matsui K.A, Biochemistry 24:7418-7423(1985). 

[1405] 553. Ribosomai protein S24e signature 
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A number of eukaryotic and archaebacteria! nbosoma! proteins can be grouped on the basts of sequence similarities. 
One of these families consists of: 

- Vertebrate S24 (1 ]. - yeast Rp50 - Mucor racemosifs 324 12] 

s . Halo bacterium marismortui HS15 [3]. - Methanococcus jannaschii MJ0394 
These proteins have 101 to 148 amino acids. 

A weii consented stretch in the central part of these proteins has been selected as a signature pattern 

10 - Consensus pattern: [FYA]-G-x{2)-[KR3-[STA]-x-G-[FYHGA]-x-[LiVM]-Y-[DNHSDW] 

[ 1] Brown S.J., Jewel! A,. Maki C.G., Roufa D.J. Gene 91:293-296(1990). 
[2j Sosa 1.., FonziWA.. Sypherd P.S. 

*5 [14063 Nucleic Acids Res. 1?:9319-9331(1989).[ 3] Kimura J., Arndt £., Kimura M. FEBS Lett. 224:65-70(1987). 
[14073 554 ■ Ribosomai protein S26e signature 

A number of eukaryotic ribosomai proteins can be grouped on the basts of sequence similarities. One of these families 
consists of: - Mammalian S28 [1]. 

20 - Octopus S26 [2] - Drosophiia S28 (DS31) [3] - Plant cytoplasmic S26 

- Fungi 326 |43_ 

These proteins have 114 to 127 amino acids. 

A conserved octapeptide in the central part of these proteins has been selected as a signature pattern. 

- Consensus pattern: (YH)--C--V-S--C--A--I-H 

[1]Kuwano Y. Nakanishi O , Nabeshima Y, TanakaT.. Ogata K. J. Biochem. 97 963-992(1985).[ 2] Zinov'eva R. 
D., Tomarev S.L Doki. Akad. Nauk SSSR 304:464-469(1 989). 
30 [ 3] itoh i\l., Ohta K., Ohta M., Kawasaki I, Yamashina i. Nucleic Acids Res. 1 7:2121-2121f1989).[ 4j Wu M., Tan 

H. Gene 150:401-402(1994). 

[1408] 555. Ribosomai protein S28e signature 

A number of eukaryotic and archaebacteria! ribosomai proteins can be grouped on the basis of sequence similarities. 
35 One of these families consists of: 

- Mammalian S28 [1] - Plant S28 [2] - Fungi S33 [3] 
Methanococcus jannaschii MJ1202. 

40 These proteins have from 64 to 78 amino acids. 

A highly conserved nonapeptide from the C-terminal e dremtty of these proteins has been selected as a signature 
pattern. 

- Consensus pattern: tv-[ST ].E;-R-E--A-R-:<-L 

45 

[ 1j Chan Y.-L, Olvera J., Woo! i.G. 

Biochem. Biophys, Res Commun, 179:314-318(1991). 

[ 2j l-twang ! . Goodman H.M Plant Physio! 102' 1357-1 358(1993). 

[3j Hoekstra R.. Ferreira P.M., Bootsman T.C., MagerW.H., Pianta R.J. Yeast 8:949-959(1992). 
[14093 556. Ribosomai protein S3Ae signature 

A number of eukaryotic and archaebacteria! nbosoma! proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

55 - Mammalian S3A (was originally known as v-fos transformation effector protein) - Caenorhabditis elegans S3A 
(F56F3.5). 

- Plant cytoplasmic S3A (CYC07 ) [1]. - Yeast Rp10 (PLC1 and PLC2). 

- Fission yeast Rp10 (SpAC1 3G6 02c) - Methanococcus jannaschii MJ0980 
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These proteins have from 220 to 250 amino acids. 

A conserved stretch in their N-terminal section was selected as a signature pattern. 

- Consensus pattern; [L!V3-x-[GH3-R-[fV3-x-E-x-[SC]-L-x-D-L 

s 

[ 1] Liu J.H., ReidD.M. 

Plant Physiol. 109:338-338(1995). 

[1410] 557. Ribosoma! protein 3-j signature 

Ribosomal protein b: is on*- of the proteins from the small ribosoma! subunit In Escheiichia colt S3 is known to he 
10 involved in the bindinq of initiator Met-tRNA. It belongs to a family of nbosomal proteins which, on the basis of sequence 
similarities [1]. groups: - Eubactenal S3. 

Aly.ji jnu" pl.j!it i.hk>!opl3i.t o3 ■ C yanelle S3 Aichaebacterui 33 
Plant mitochondrial S3. - Verteorate S3. - Insect S3. 
h - Caenorhabditiselegans S3 (C23G1Q 3> - *east S3 (Rp13j 

S3 is a protein of 209 to 559 ammo-acid residues. 

A conserved region located in the C-terminal section has been selected as a signature pattern 

20 - Consensus pattern: [GSTA3-[KR]-x{6VG-x-[LiVMT3-x(2)-[NQSCH]-x{1 .3)-[LiVFCA3-x{3)-[LlV3-[DENQ]-x{7V[LMT]- 
x('2)-G-x(2)-G 

[ 1] aaka E.. Hashimoto I. Mizuta K. 
Protein Seq D3ta -in i! 5 2oS-3G0t1SQ3) 
25 [1411] 558. Ribosoma! protein S4 signature 

Ribosomai protein S4 is one of the proteins from the small nbosomal subunit. In Escherichia colt. S4 is known to bind 
diiedly to 16S nbcscnial RNA Mutation's m S4 have been shov\n to inue-a^e ttansla!i<: nal etrot frequent tes It bekngs 
to a f imtlv of nbosom i! prott*in=^<hn.h on thi* basis of ^eqimnef Mniilantms [1 2] cuoups - EnbJieterul M - Alqjl 
and plant chloroplast 34. 

- Cvanelle S4 - Atchaebaiterul S4 - Mammalian 39 - teost YS1 i (SUP4?j 
Marchantia polymorpha mitochondrial S4. - Dtctyostelium discotdeum rp1024. 

Yeast patten NAM9 [o] NAM" h^s bt-^n v. hat ^cteictrt as a i.uppf<=SbOi ki othie mutations in mik _iiondfial DMA 
ittoulo be a ritoiomal ptotem th.it aits as a supptessoi bv oei reading ttansLttk n acoutao, 

34 is a protein of 171 to 205 aimno-aad residues (except for NAM8 which is much larger). The signature pattern tor 
this protein ti, ba^d on a >'onsi : 'tved regie n located in the- cervlra! station of these protein 1 . 

- Consensus pattern [U\ MHDE3- A -R-[Li]-M3J-[LhA1C3-[Uv5FrHQ3-EKRT3-M3HSTAGCV FJ-v-fSTHSK^AIHKP]- 
40 x-[LIVMF]{2) 

[1] Mizuta K Hashimoto T SuzuVi K I Otaha E Nu.ieit A;ids Res 19 2603-2608(1 Wi 
[2jOtakat-: Hashimoto'! Mirruta K Protein Seq L>ata Anal e 28c- 300f 3t 

[ 3] F-oguta M Dnx ohowska A Boisul* F U'mtelh Gaigoun A I. ajowska J Siommski P 370zesni.t!> F 
■>s kiusj^wska A Mu! r f \i 8eoI 12 402-412(1^92) 

[1412] 559 Ribosoma! protein S4e signature 

A number of tukarvofic : md Jichat;ba<ten : i! nbosomal ( rcUins i\in be grouped cn the basis of st-qut-nw simiLinti^s 
One of tnese families consists of: 

Mammalian 34 [1 j. Two highly similar isoforms of this protein exist : one coded by a gene on chromosome Y. and 
the other on chromosome X. 

- Plant cytoplasmic S4 {2] - Yeast 37 (YS6). - Archebactenai S4e. 

55 Th.^.* proteins h3v>> 233 to 2^4 imino ai. ids 

A hignK conserved stretch of 15 tesidues in then N-teiminal section has oeen selected as a signatuie pattern Four 
positions in this region are positively charged residues. 
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- Consensus pattern: H-x-K~R4LIVMFHSANK]-x-P-x(2)-{VVY3-x-[LiVM^~[KRP3 

[ 1] Fisher E M . Beer-Rorneto P., Brown L.G., Ridley A . McNeil J.A . Lawrence J B , Wiliard H.F . Bieher F R.. 
Page D.C. Cell 63: 1205-1 21 8(1 990). 
s [2] Braun H.P.. Emmermann M, Mentzel H., Sehmitz U.K. Biochim. Biophys. Acta 1218:435-438(1994). 

[1413] 560 Ribosomai protein S5 signature 

Ribosomai protein S5 is one of the proteins from the small ribosoma! subunit. in Escherichia coil, S5 is known to be 
important in the assembly and function of the 308 ribosomai subunit Mutations, in 85 have been shown to increase 
10 translations! error frequencies it belongs to a family of ribosomai proteins which, on the basis of sequence similarities 
(1,2). groups. - Eubacteria! S5. 

Cyanelle SS. ■ Red algal ohloroplast SS. ■ Archaebaot.erial S5. 

- Mammalian S2 fLLrep3). - Caenomabditis elegans 82 (C49H3 11 ). 

»5 - Drosophiia S2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondria! 35. 

55 is a protein of 166 to 254 aminc-acid residues. The signature pattern fortius protein is based on a conserved region, 
rich in glycine residues, and located In the N -terminal section of these proteins. 

20 - Consensus pattern: G-[KRQ3-x(3)-(FY]-x-[ACV]-x(2)-ELIVMAHLlVM3-[AGHDN]-x{2)-G-x-ELIVM3-G-x-[SAG3-x 
f5,6HDEQ3-[LlVMA3-x(2)~A-[LlVMF3 

[ 1] All-Robyn J.A., Brown N., Otaka E., Liebman S.W. 

Moi. Cell. Biol. 10:6544-8553(199Q).( 23 Otaka E., Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
25 [14143 561. Ribosomai protein S6 signature 

Ribosomai protein S6 is one of the proteins from the small ribosomai subunit In Escherichia coli. S6 is known to bind 
together with S18to 16S ribosoma! RNA. it belongs to a family of ribosomai proteins which, on the basis of sequence 
similarities groups: - Eubacteria! S6 - Red alga! ohloroplast S6 

30 . Cyanelle S8. 

56 is a protein of 95 to 208 amino-acid residues. The signature pattern for this protein is based on a conserved region 
located in the N-termina! section of these proteins. 

35 - Consensus pattern: G-x-(KRCHOENQRH}-L-[SA]-Y-x-l-[KRNSA] 

[1415] 562. Ribosomai protein S6e signature 

A number of euKaryotic and 3rchaebactena! ribosoma! proteins C3n be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian S6 [1] - Drosophiia S6 [2] - Plant S6 [3j. - Yeast S10 f YS4 1 

Halobactenum mansmortui HS13 [4] - Methanococcus jannaschii MJ1260 36 is the major substrate of protein 
kinases in eukaryotic nbosomes (5); it may have an important role in controlling cell growth and proliferation through 
the selective translation of particular classes of niRNA 

These proteins have 135 to 249 amino acids. 

A conserved stretch of 12 residues in the M-termina! part of these proteins has been selected as a signature pattern. 

- Consensus pattern: ELIVM3-(STAMR)-G-G-x-D-x(2)-G-x-P-M 

E 13 Franco R , Rosenfeld M.G. J. Biol Chem. 265 432 M325{1990}. 

E23 Watson K L , Konrad K.D , Woods D.F, Bryant P. J. Proc Nat! Acad. Sci. U.S A, 89 11302-11306(1992) 
[ 3) Hansen G., Estruch J.J., Spena A. Nucleic Acids Res. 20 5230-5230(1992). 
[ 4] Kimura M„ Arndt E,, Hatakeyama T„ Hatakeyama T., Kiinura J, Can. J. Microbiol, 35:195-199(1989). 
ss [ 53 Bandi H.R., Ferrari S. ( Krieg J., Meyer H.E., Thomas G. J. Biol. Chem, 268:4530-4533(1993). 

[1416] 563. Ribosoma! protein S7 signature 

Ribosomai protein 37 is one of the proteins from the small ribosoma! subunit. In Escherichia colt, S7 is known to bind 
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directly to part of the 3'end of 16S ribosomal RNA. it belongs to a family of nbosomal proteins which, on the basis of 
sequence similarities [1.2.3], groups: - Eubacterial S7 

Algal and plant chloroplast S7 - Cyanelle 87. - Archaebacterial 87 
s - Plant mitochondria! S7. - Mammalian S5. - Plant S5, 
Caenorhabditis elegans S5 (T05tv.11 1 ). 

The best conserved region located in the N-termma! section of these proteins has been selected as a signature pattern 

10 - Consensus pattern: [DENSK]-x-[LIVMDET3-x(3HL!VMFTA3(2)^i6}-G-K-[KR3-x(5HLIVMF]-fLIVMFC]-x(2)- 
[STAC] 

[ 1j Klussmann S . Franke P. Bergman n U . Kcstka S., VVittmann-Liebold 8 Biol. Chem. Hoppe-Seyler 374: 
305-312(1993). 

*5 [ 2j Otaka Hashimoto T„ Misuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 33 ignatovich O, Cooper M., Kuiesza H.M., Beggs J.D. Nucleic Acids Res. 23:4616-4619(1995). 

[1417] 564. Ribosomal protein S7e signature 

[1418] A number of eukaryotio ribosomal proteins can be grouped on the basis of sequence similarities [1] One of 
so these fa m 1 1 1 e s eo n s i sts of. 

Mammalian S7. 
Xenopus SS. 

- Insect S7. 

25 - Yeast probable ribosomal protein S? (N2212). 

Fission yeast probable ribosomal protein S7 {SpAC 18(56 13ci. 

These proteins have about 200 amino acids A highly conserved stretch of 14 residues which is located in the central 
section and which is rich in charged residues was selected as a signature pattern 
30 [1419] Consensus pattern- [KRj-L-K-R-E-L-E-K-K-P4SAP]->-jKR3-Fl 

[1420] [1] Salazar C E.. Mills-Hamm D M.. Kumar V. Collins F.H Nucleic. Acids Res 21 4147-4147(1993). 
[14213 565. Ribosomal protein S8 signature 

Ribosomal protein S8 is one of the proteins from the small ribosomal subunit In Escherichia colt, S8 is known to bind 
directly to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 
35 groups: - Eubacterial S8 - Algal and plant chloroplast S8. 

Cyaneiia S8 - Arehaebacterial S8. - Marohantia polymorphs mitochondrial S8 

- Mammalian S15A. - Plant S15A - Veast S22 (S24) 

40 The best conserved region located in the C-termina! section of these proteins has been selected as a signature pattern. 

- Consensus pattern: (GE3-x{2HLfVK2HSTYHST3-x{2)-G-(LIVMK2)-x{4HAGHKRHAYt] 

j 1) Otaka Hashimoto I. Mizuta K. 
4S Protein Seq. Data Ana! 5 285-300(1993). 

[1422] 566. Ribosomal protein S8e signature 

A number of eukaryotio and arehaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1]. One of these families consists of: 

so - Mammalian S8 - Caenorhabditis elegans S8 (F42C5.8) - Leishmania major S8. 

- Plant S8. ■■ Yeast SfJ (S 1 4 1 (Rpl 9). ■ Archebacteriai S8e 

These proteins have either about 220 amino acids (in eukaryotes) or about 125 amino acids (in arcbebacteria). A 
conserved stretch which is located in the M-terminai section and which is rich in positively charged residues has been 
55 selected as a signature pattern. 

- Consensus pattern' [KR3-x(2)-[ST3-G-[GAH(5i-[HRj-[KGHKRj-x-K-x-E-[l..M3-G 
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[ 1] Engemann S . Herfurth £., Briesemeister U„ Wittmann-Lieboid B. 

•J, Protein Chem. 14:189-195(1995). 

[1423] 567. Ribosomal protein S9 signature 

Ribosomal protein 89 is one of the proteins fromthe small nbosomal subunit [t belongs to a family of ribosomal proteins 
s which, on the basis of sequence similarities [1.2], groups' - Eubacterial S9. - Algal chloroplast S9 

- Cyaneiie S9. - Afchaebacierial S9 - Mammalian S16 - Plant S16 
Yeast mitochondrial ribosomal 39. 

to A conserved region containing many charged residues and located in the central section of these proteins has been 
selected as a signature pattern. 

- Consensus pattern: G^3-G-*(2)-EGsSA>0--x{2HSA3->«i3HGSAJ->«4C9& v TAV}-[KRHGSAI..HI..iF3 

*5 [ 1] Chan Y.-L. Paz V., Oivera J., Woo! i.G. PEBS Lett. 263:85-88(1990). 

[2] Otaka E„ Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5 285-300(1 993). 

[1424] 568 Pibuiose-phosphate 3-epimerase family signatures 
so Ribulose-phosphate 3-epimerase (EC 5.1.3 1) (also known as pentose-5-phosphate 3-epirnerase or PPEi is the en- 
zyme that converts D-nbulose 5-phosphate into D-xylulose 5-phosphate in Calvin's reductive pentose phosphate cycle. 
In Alcaligenes eutrophus two copies of the gene coding for PPE are known [1 ], one is chromosomal ly encoded (cbbEC ). 
the other one is on a plasmid (cbbeP). PPE has been found in a wide range of bacteria, arehebaeteria. fungi and plant* 
The sequence of PPE is highly related to: 

Escherichia colt D--allulose-6--phosphate 3-eptmerase (gene aist:). 

Escherichia colt protein sgcE. 

Mycoplasma genital mm hypothetical protein MG112. 

30 All these proteins have from 209 to 241 amino acid residues. 

Two conserved regions which are located respectively in the N-terminal and in the central part of these proteins have 
been selected as signature patterns. 

- Consensus pattern: [L!VMF3-H^LIVMFY]-0-{LIVM3"X"D"X{ 1 ,2 HF YHUVM}-x-N-x-[STAV] 
35 - Consensus pattern: [L!VMAJ-x-[LIVM}-M-fSTHVS3-x-P-x{3)-G-Q-x-F-x(6HNKHL!VMC] 

[ 1] Kusian 8., Yoo J.G.. Bednarski R. . Bowie n B. 
J. Bacterid 174:7337-7344(1992}. 

[1425] 569 {Ricin B lectin} Similarity to lectin domain of ricin beta-chain, 3 copies 
40 [1426] This family consists of a triplicated domain involved in cell agglutination in ricin. 
[1427] 570 (Rotamase) PpiC-type peptidyt-prolyt cis-tians isornerase signature 

Peptidyl-proiyl cis-trans isomerase (EC 5.2.1 8) (PPIase or rotamase} is an enzyme that accelerates protein folding 
by catalyzing the eis-trans isomehzation of proline imidic peptide bonds in oligopeptides [ 1 }. Most characterized PPiases 
belong to two families, the oyclophiiin-type (see <PDOC00154> i and the the f-'KBP-type (see <PDOC00426>> Recently 
45 a third family has been discovered [2.3] So far, the only biochemically characterized member of this family is the 
Escherichia coli protein parvuiin (gene ppiC). a small (92 residues) cytoplasmic enzyme that prefers amino acid resi- 
dues with hydrophobic side chains like leucine and phenylalanine in the Pi position of the peptides substrates. PpiC 
is evolutionary related to a number of proteins that are also probably PPtases 1 

so - Escherichia colt and Haemophilus influenzae ppiO. PpiD is a PPIase which contains a penplasmieppiC-like domain 
anchored to the inner membrane and which seems to be involved in the folding of outer membrane proteins 
Escherichia coli surA. SurA is a periplasms protein that contains two ppiC-like domains. 
Nitrogen-assimilating bacteria protein nifM which is involved in the activation and stabilization of the iron-compo- 
nent (nifH) of nitrogenase. 

55 - Bacillus subtiiis protein prsA, a membrane-bound lipoprotein involved in protein export. 

Lactococcus and lactobacillus protease maturation protein prtM, a membrane-bound lipoprotein involved in the 
maturation of a secreted serine proteinase. - Yeast protein ESS1/PTF1 processing/termination factor 1) 
Drosophiia protein dodo (gene dod). - Mammalian protein PIN1, 
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Campylobacter jejuni ceil Dinding factoi 2 tCBF2i a secreted antigen 
Bacillus subttfis hypothetical protein yacD. 

- Helicobacter pylon hypothetical protein HP0175. 
A hypothetical slime mold protein, 

s 

A conserved region that contains 3 serine which could play a rote in the catalytic mechanism of these enzymes has 
been selected as a signature pattern. 

- C onsensus pattern f- [GbADE-ij v \i\ Vj|-A vr^si j v, i 4>-[SIO]- <(3 e.)-[G£-Rj G-\-|t.t\,MKGSJ 

10 

[ 1) Fischer G. . Schrnid FX. 
Biochemistry 29:2205-2212(19901. 

[ >] Rudd K £ Sofia H J Koonm F. V Plunkett G III L.iz.m S , 
PcwieiePE Tiems Biornem Sci 20 14-15(1905) 
is [ 3] Rahfeld J.-U.. Ruecknagel KP. Schelbert B.. Ludwig 8.. Hacker J„ 

Mann K F-ischei G FE8S Lett 3=2 1e0-t84 ( 1< «4} 

[1428] S71 ^RmaADj Rtoosomal RNA adenine dirnethylases signatuie 

a number of en.tymes lesponsible tot the dimethylation of adenosines if nbosomal PNAs 2 1 1 4d) have been 
so found [1 .2] to be evolutionary related. These enzymes are: 

Bactenal iGS iRNA dirnethylase (gene ksgA) y.hich acts in the biogenesis of noosomes by catalyzing tne dimetn- 
yhtiui of h<\o adjacent adenosines in thf loop <.f 3 con^ei^ed hanpin ne=ii the 3'-end of 1GS rRNA Inadiy^tion of 
fsgA \<!fOh to ffsistarKu to th> j aminoglycoside antibiotic k<sugainycin 
2& - Yeast 1SS rRNA dimethylase (gene DIM1 1 which is functionally similar to ksgA and that dimethylates twin ade- 
nosines in the 3'-end of 18S rRNA. 

Bacterial Vim' meth\iases These en.rytues tonfei resistance to inacrolide-lina.samide-streptogiamm B (Mi.Si 
antibiotics -such is tjiythtomyan - by dun* th^ latino tht adenine residue at position 20"f of 23S rPN-* thus resulting 
in a i educed affinity between rtbosomes and the MLS antibiotics 
30 . Gaenorhabdibs elegans hypothetical protein tu2H1.1. 

The best conserved regions m these enzymes is located in the N-terminal section and corresponds to a region that is 
probably involved in S-adenosv! methionine (bAM) binding. 

35 - Consensus pattern: {L!VMHL!VMFYHD£^x.G-[STAPV3-G-x-[GA3-x4L!VMF]^ST]-x(2HLlVM]-xf6HLiVMY]-x- 
[STAGV3-[L!VMFYHC)-E-X-D 

[ 13 van Gemen B.. van Knippenberg PH. (In) Nucleic acid methylation. Clawson G.A.. Willis D.8.. Weissbach A... 
Jones PA Eos pp 12-36 Alan R Liss Inc Ney/-roil (12t0i 
40 [2j Lafonteme [i Dekoui J GlabSti A [_ C^syiesJ Vand^nhaute J J Mo I Biol ^41492-497(1904) 

[1429] ^2 (RtiSisCO smalh Ribulose oisphospnote raibovylase situII chain 200 members 
[1430] "iV3 a]'P,GTP binding site mottt A fP loop) nasi 

(■'■torn sequence comp.jnsons and c ivstallogrrtphic data analysis it ha* been sho*\n [12 3 4 6 6[ th.tt an appreciable 
45 proportion oi proteins that bind ATP or GTF shar- 1 a nnmbft of mor> j or le^s cun^eivud sequence mottts The bust 
crnsen/ed of these inrtifs t e a glynne-nch region which typically fotrns a flevible loop between a beta-stiand and an 
alpha- h<?lr> 1 Ins loop interacts vuth one of the phosphate gioups of the nucleotide This sequence motif is geneialh 
referred to as the 'A consensus sequence [1] or tht 'P-loo( [S3 Theit; : ne riumeious ATP- a GTP-binding pioteiris in 
which tne P-loop is found A number of protein families for which the relevance of the oresence of such a motif has 
so be<=n noted art listed ndo^ - ATP s\nthase alpha and beta subunits - Myosin h<wy chains - Kintstn heavy th^ins 
and fonesin-like proteins Dvnsminr and dynamin like proteins ■ Guanvlate kinase ■ I'hymioine kinase (■ Thymtdylafe 
kinase, - Shiktmate kinase. - Nitrogenase iron protein family {ntfH/frxC) - ATP-binding proteins involved in 'active trans- 
port' (ABC transporters) [?3 - DMA and RNA hehcases [8.9.10]. - GTP-binding elongation factors (EF-Tu. EF-1 alpha. 
EF-G. EF-2, etc), - Ras family of GTP-binding proteins (Ras. Rho. Rab. Ral, Yptl SEC4. etc ) - Nuclear protein ran. 
ss - ADP-nbosvlation factois family - Bactenal dnaA piotein - Bactenal fecA protein - Bacteria! recF piotein - Guanine 
nucleotide-binding proteins alpha subunitsiGi Gs Gt GO etc 1 ~DNA mismatch lepairptotetns mutS family -Bactenal 
tvpe 11 *e<.rehon s^item pictein F Not all ATP- oi GTP-binding pictems are pick^d-ut hv this motif A numLer of 
( fotems escap> : ' detection tecause the structure of their ATP-bmding htiv is complete ly diff> : 'it;nt fiom that cf the P- 
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loop Examples of such proteins are the E1-E2 ATPases or the glycolytic kinases In other ATP- oi GTP-nmding proteins 
the flexible loop exists in a slightly different form: this is the case fortubulins or protein kinases, A special mention must 
be r^^ived tot adenylate kinase in which th^re is a ^mgtt; d-iMation ftomlhe P-locp pattern in ttn» last posilion Giy is 
found insteaa of Ser or Thr. 
s Consensus pattern: [AGj~xs4V<3~K-[STj 

In addition to the ptotems listed above the 'A' mctif is ah.o found m a numhei of othet protein i. Most of these piotems 
probablvbind i nuci> j oiic}> hutothf-ts ar^df-finitiveh noiATP-orGTP-hinciirigia^fui ex nnpi> j cfnniottvpsin or human 
ferritin light chain). 

j 1 1 Walker J £■ oaiaste M , Runnel- M J Gay N J C-MBO J 1 94o-9f.t<l982) [ 2| Mollei W Amons P n\-BS Lett 
10 18o !-7{ 19351 [ 3] Fiy D C Kuby S A Miidvan A S Pick Nat! Acad vi U b A 83 907-9 11(1 986) [ 4] Devei TE 
GlymasMJ Men ici- W C Proo Nati Arad <$n U 3 A 84 1814-1818(1387. [ 5] <iai ist.>M SihbaidPR VVitfinghofer 
A. Trends Biochem. Sci. 1 5:430-434(1 990). [ 6] Koonin E.V. J, Moi. Biol. 229: 11 65-1 174(1 993 ).[ 7] Higgins C F Hyde 
S C Munin.jck M M Gileadi U Gil! D R G.jilagher M P J Bioenerg !3iomemhi 22 6" 1-M2(19'X}) [8] Hodgman T 
C Nature 333 2Z-2^ and Nature 335 578-<v8( 19831 tEirata' [ 9] Linoer P Lasl-o P Ashfcurner M Lwoy P 
*5 Nielsen P.J., Nishi K.. Schnier J., Slommski P.P. Nature 337:121-122(19S9t.[10) Gorbalenya A.E., Koonin E.V., 
Donchenko A.P.. Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 
[1431] GTP-tmdino, ntkle.tt protein r.m signatute (tail 

Ran (or TC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and which has been implicated in a 
large number of processes including nucieoeytoplasmic transport, RNA synthesis, processing and export and cell cycle 

so checkpoint control [1.2]. Ran is generally included in the RAS 'superfatnily' of small GTP-bmdmq proteins [3]. but it is 
only saiiohtly related to the otner RAS pioteins It also diffws from PAS protnns in trut ft Luks cysteine tesidues at its 
C~ terminal and is therefore not subiect to ptenylation Instead ran has an acidic C-termtiuis It is hov.e^et similai to 
PAS famiK m^mLers in leuuinn;; a specific guanine nucleotide ^xchanrptaaortGEF^ and a specific GTPase activating 
protein (GAP) as stimulators of overall GTPase activity: The region of the GTP-bmdmq B motif which, in ran, is perfectlv 

25 conserved has been selected as a signature pattern. Consensus pattern: D-T-A-G-Q-E-K-{LF)-G~G-L-R-[0£3-G-Y-\- 
Protetns belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop), 
[1]^hc-ff7ekK KiebeC Ftit::-Wolf K KabschvV Wittinghofer A Nature 374 37y-381(1P95i [ 2] Push M G Dri^as 
G d'Eustaefuo P EiuEs^ays 18 10i-112(19«6i [ 3] Valencia A Chardm F Wittinghofer A S md^i C Biochemistry 
30:4637-4648(1991 ). 

30 [1432] 5/4. recA signature 

The bacterial lecA pi?tetn (12 3 E1] is essential fot homologous recombination and recombinations! repau of DM- 1 
damage. RecA has many activities: it filaments, it binds to single- and double-stranded DMA. itbinds and hydrolyzes 
ATP it is also a i^comLinase and finally it inteiacts with !e<A causing its activation and leading to its autocata lytic 
cleavage Re*. A is a protein of about 350 .tmino acid residuei, Its sequence it veiy well conserved j3 4,t\E 1] among 

35 eiibsictsinal specie it th also found in tht- chlotupiast of plants [6j Th> j best cun^rvud region a nonapepridt; loeatf-d 
in the middle of the sequence which is part of the monomer-monomer interface in a recA filament has been selected 
as a signature pattern.. 

Consensus pattern A-L-[KR3-[ir3-[rV]-[£TA]-[3TAD3-[LIVMG]-R- 

[iJSmttnhC VVangT-C V BioEssa\s 10 i2-16(1989i [ 2] Lloyd A T ShatpPM J Moi Evol 37 399-407(1 993 > 
40 [ 3] Roca A I Cox M M Ptc^ Nuclei, A^ids Res Moi Bioi 56 1^9-^23t 19971 [ 4] Karlm S Weinstocl' G M Bi^ndel 
V. J. Bacterid. 1 77:8881 -8893(1 995).[ 5] Etsen J.A. J. Moi. Evol. 41:1105-1123(19951.[8] Cerutti H.D.. Osman M.. 
Granooni P, Jagendorf A T Proc Natl Acad Sci USA 60 8068-30 7 2( 1992} [£1] http /Vaav. tigr org, -jeisen 'RecA/ 
RecA.html 

[1433] 5"^ Response legulator tecei^ er domain 
45 This domain receives the stgna! from the sensor partner mCornrnent: bacterial two-component systems. It is usually 
found N-terminalComment to a DMA binding effector domain. 
(1 1 Pao GM Saier MH J Moi f- vol 1^95 40 iV- 1 e 4 
[1434] ^76 Ribonucleotide reouctase large subumt signature 

"Ribonucleotide reductase (EC i 17 4 1 1 [1 2j catalyses the reductive synthesis of deoxv'nbonucleotides from their 
so coi responding ribonucleotides It Lio^ctes the piecursois necessaiy for DNA synthtsii. Ribonucleotide reductase is 
an ohgomeric enzyme composed cf a large subumt < ''00 to 1000 residues} :tnd a small subumt (3u0 to 40u residues). 
Theie aie teoions of similanti^s in the secfuenre of the laige chain from proi'aryotes, eukaryotes and viruses. One of 
these regions has been selected as a signature pattern 

[1435] Consensus parkin W->i,2}-[LF3->u6 7)-G-[LtvM]-[F>RA3-[NH]- t (3HSTA^L!VM3-[ASC]-xf2)-[PA3- 
ss [ 1] Niilson O., Lundqvist T.. Hahne S.. Sjoberg B.-M. Biochem. Soc. Trans. 16:91-94(1988),} 2] Reichard P. Science 
26u 1 773- 1 ~77< 199J"! 

[1436] 577 Ribonticiease T? family histidme active site* 

The fungai nboriucli : 'ases T2 fioin Asfj^rqillnsj oryzae M fiom Aspetgillus saitctiand Rh from Rhizopeus niveus are 
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sttucturalK and functionate related JO Kdglycoptotems [ij that cleave the J -5 internucleotide linkage of RNA via a 
nude*, tide 2' 3'-cvck phosphate intw mediate \,Ek, o 1 £1 1 > A numbei of othet RNAbeb hav* been found to be e\o- 
lutionaiy related to tru^f* fungal en::vmes - Self-incompatibility [2] in floweting Uantt, it, often eonftolicd by : i single 
gene (S-gene) that has several alleles Thtsa gene presents fertilization oy selt-pollen ot by pollen bearing etthei ottho 

* feo& alleles evprcsseo in the stv'io The self-incompatibility gKcoprotem from several higher plants of tne solanaceao 
familv hrts been shown {'/ 3) to te .j nbonucleaiP Phosph.tte-i.{.jivation induced RNAses LE: and L>' ftom tomato 
[4] These t'vo isnzyrth*!. ate- probata iruoked tn a phospha^-sLsr*. tiion tescue system - E^ch^ndiia coll p^npla^mic 
RNAse 1 (EC 3.1.27.6) (gene ma) [5]. - Aeromonas hydrophila penplasmic RNAse. - Haemophilus influenzae hypo- 
thetical protein Hi052(r Two histidmes testdues have been shown [6, ''] to be involved in the catalytic mechanism of 

10 RNase T2 and Rh These r^iduts and the region atound thorn atchighly oc n^tved in all Hu : ' sequence d^saibed 
abo.'f- T'vo sicmatur- 1 patterns havt he^n d^dop^d one- fot each of th> j bvo active-sit^ histidmes Thi* second partem 
also contains a cisterns which is known to be involved in a disulfide bond 
Consensus pattern; [FYWL]-x-fL!VM]-H-G-L-W-P [H is an active site residue] 

Consensus pattern [LAMF] x(2)-[HDGT 'HEOH^VV]-v-[KP]-H-G-k-C (H s an act ve bite les, due] [C is invof> ed in 
*s a disulfide bond] 

[ 1]Watanabe H.. Naitoh A., Suyama Y. Inokuchi N.. Shimada H., Koyama T. . Ohgi K.. Irie M. J. Biochem. 108:303-310 
I t9tV) j 2| Hanny V Giay J E- McClute B A Anderson M A C l.jike & E- Science 260 t*"-^ t(1Jt'"U)) j 3\ Manure 
B A Hanng V Ebert PR Anderson M A Simpson P J Sal^ama F Clarke A E Nature 342 t'o3C7(19e9i [ 4j Lo- 
effler A.. Gfund K.. Irie M, Eur J. Biochem. 214:S27-633(1893},[ 5] Meador J. 111. Kennel! D. Gene 95:1~7(1990).[ 6] 
kawafa Y b : il<ivain : i F Hayashi F hyogoku Y Eut J Biochem 187 255-2o2< 1990) [ 7j hurthata H Mit&ui i' Ohgi 
K ItieM MizunoH Makamuta k T FEBS Lett 300 189- 192(1 902 ) 

[1437] 57b Ribonucleotide teductase large si ibunit signature Pibonucleotide reductase (EC i 17 4 1 >[ t 2] catalyz- 
es the tfdtktivt synthesis of dto^i ibonudeotidtsttuTith^ircoi trending nbonud^otides It piovidesthe precursois 
nsicsis^aty fot DNA ^yntht^is Ribonucleotide reductase is, an oligomune tn^yint; composed of a latge suburut (""00 k» 

25 1 000 residues) and a small subunit (300 to 400 residues). There are regions of similarities in the sequence of the large 
chain from prokaryotes euKaaotes and viruses One ot these regions has been developed as a signatuie pattern 
[1438] Consensus panel n W-"m2)-[I. Fj-Mf _ )-G-[LIVMHF>PA3-ENH]-.!3H^TA<.vLlVMj-^bCl-M/)-[rAj- 
[1439] [ 1] Nillsun O Lundqvist T Hahne s Sioherg b -M Biochem Soc Trans 1« 31-&4{10£fet [ 2} Peichatd P 
Science 260: 1 77-3-1 777( 1 yy3 ). 

30 [1440] 579. RNase H 

PNase H digests the RNA strund of an RNA'DMA hvbtn Important enzvme in tetnvtral tepli:ation cycle, and often 
found as a domain associated ^rth reverse transcriptases Sttucture isamued aipha+beta fold ^>ith three a/b/a la>ers 
[1441] ft}*_ Eukatyotic j.utativ<= RNA-bindmg region PlslP-l ^i^natuie itrin) 

Man\ eu!>.ttvotio proteins th.tt are kntwn ot ittf po^ed to bind tingiV-itiandedRNA contain <.ne or mote otpiPi. of a 
A" pufatue RNA-binding domain of about QOanuno acids [1 2} This region h is be^n found in the following ptote-ins ** 
Heterogeneous nuclear nbonucleoprotems **- hnRNP A1 (helix destabilizing protein) (twice). - hnRNP A2/B1 (twice). 

- hnRNP C t/C2} ^oru-t-1 - hnRNP E <UP2} t,at leait onoe ) - hnRNP G ( one* ) " binall nuclear tibonudeopicteini, 
** - U1 snRNP 70 Kd {once). - U1 snRNP A (once). - U2 snRNP B" (oncek ** Pre-RNA and mRNA associated proteins 

~ Protein synthesis initiation factor 4B (eiF-4bl [3] a orotein essential for the binding of mRNA to tibosomes toncei 
40 - Nucleolm (4 times). - Yeast single-stranded nucleic acid-bmdmg protein (gene SSB1 ) (once). - Yeast protein NSR1 
(twice l NSR1 is involved in pre-rRiMA processing: it specifically binds nuclear localization sequences. - Polyf A) binding 
protein tPABP) ^4 times). ** Others ** - Drosophila sex determination protein Sex-lethal (Sxi) (twice). - Drosophila sex 
determination protein Transformer-2 (Tra-2) (once). - Drosophila 'elav' protein (3 times), which is probably involved in 
the RhJA metabolism t f neuions ■ Human par^neopLtstic encephalomyelitis ,jntip.en HuD i 3 tunes) [4] which is highly 
45 similar to elav and which may play a role in neuron-specific RNA processing. - Drosophila 'bicoid' protein tones) [5]. a 
segment-polarity homeobov protein tnat may also bind to specific mRMAs - La antigen toncei a protein which maj 
play a iole in the traitsmption ot RNA polymeiase III ■ 'I he C 0 Kd Po protein (Oit> i a putative RNP complex protein 

- A mace piotein inouted by obsctsic acid in response to watei stiess i vhich ieems to fce a PNA-biming piotein - 
Tnree tobacco proteins locateo inthechloroplastjc] wnich ma) be invoked in splicing and 'or processing of chloroplast 

so RNAs (twice). - XI 6 [7], a mammalian protein which may be involved in RNA processing m relation with cellular pro- 
liferation and/or maturation. - Insulin-induced growth response protein Ci-4 from rat (twice) - Nucleolystns TIA-1 and 
Ti&R (3 tiiTiHs) [Sj whi:h possesses nucl^olytic uctiyity ugainst cytoto<i; lymphocyte target ceils mjy be involved in 
apoptosis. - Yeast RNA1 5 protein, which plays a role tn mRNA stability and/or po!y-(A)tai! length {9). Inside the putative 
RNA-bindmg dcmain thtie ate tsvo regions vvtucti ate ttighly conset\ed Ttte first ont is a hydrophobn. s^gtrient of hi> 

ss residues (which is called the RNP-2 motif), the second one is an octapeptide motif (which is called RNP-1 or RNP- 
CS). The position of both motifs in the domain is shown in the following schematic representation: 
[1442] >\oxvxi##ff##ffxvvo\ox^xvvo\oxvvo\oxv^^ , RNP-2 RNP- 1 

The RNP-1 motif h : is been used Ah a stgnatute (.attain for this tvpe cf dcmain 
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Consensus pattern [RS- j-G-{EDRf' HPCGHAGSC!HFYHUVA]a-[F^ !~M] In most cases the lesidue in position 3 of 
the pattern ts either Tyr or Phe. 

[ 1] Band::iulis R I Sw=mson M S Eneyfuss G Genos Dev 3 4o1-437( 1989) [ 2] Creyfuss G Svvanson M S Pttiol- 
Poma S Trends Biochem Set 13 86-01(1988) [ 3] Milotirn S C Hetshey J W B Davies M V hHlehw K Kaufman 

s R.J, EMBO J. 9 2783-2790(1990) [4] SzaboA Dalmau J . Manley G . Rosenfeld M., Wong E . Henson J . Posner J. 
B Fumeam H M C ell C 325-333.JP91) [ 5] Rebaylwti M Cell ■58 231 -23; 1 19%) |t>} Li \ Sugiuia M EMBO I 9 
305 f t-o06o(1 f t90) [ 7] -^ane M Fiuiiss U Ko^hief G k ! ielst;n P.f Nucleic Acids Res 1 9 12~?-1278( 1391 > [ ft] 
Kav\ahaim A Tian O Duan X Streuli M Schiossman S F Anderson P Proc Nat! Acad Set USA 89 36tn-b685 
(1982).[9] Minvieile-Sebastia L, VVinsorB.. Bonneaud N.. Lacroute F. Mol, Cell. Biol. 1 1:3075-3087(1 991). 

10 [1443] 581. Rubredoxm signature 

[1444] Ruhre-do^ms [1] ar> j small ele-etron-transf-T prokahotn. proteins Thty contain m iron item which ts lighted 
by four t^st^inf lesidues Rubredoms aie in soinf cases functionally intw changeable with feittdoxms 
[1445] A conserved ieyi<.n that includes f-Ao of the <.\steme lesidues that bind the iton atom h^s been selected a? 
a pattern for these proteins. 

J? [1446] Consensus pattern [UVM]-n3>-W-A-C-F^\-C-[AGD] [The ko C's Dind the iron atom] 

in Pseudomonas oleovorans rubredoxm 2 {gene alkG) [2]. this pattern is found twice because alkG has two rubredoxm 
domains. 

Riibrerythnn [3] a orotein\wtn inorganic pyrophosphatase activity ttom Desulfovthno vulgaris possesses a C -terminal 
rubredoxm- hke domain, but this domain is too divergent to be detected by the above pattern. 

[1447] [1]BeigJM Holm R H <Jn) iron-sulfut protein 1 . Spiro Tb Ed pp 1-6^ Wi!.='> Now-Yok f 1 082) [ 2] Kc k 
M OldenhuisR dei Linden MPG Meule iberg CHC Kmgna J Witholt & J Biol Chem 264 5442-545 1(1?89) 
[c] van Beeumen J J van Dnessche G Liu M -V Le Gall J J Biol Chem "60 2 06 4 5-2 06 5 a 1 Us* 1 J 
[1448] 58^ (i\p) Eukarvotn, and vital a^par^l pioteas^s aaiv<= site 

Aspattyi ^toieases also knw n as acid proteases ( EC ^ 4 23 -1 art; a v\idei s distributed fjmiK of ptoieoiyiic enzymes 
(1 2 3] k-newn to exist invertebrates fungi plants retroviruses and some plant viruses Aspartate proteases of eukan,- 
otes aie monomeric, ewymes wht<.h consist of t*o domains t.aoh domain contains an active site centeied on a catalytk 
aspatt\i residue The domains mos! probably evolved frcm the duplication ot an arKesttai gsr-ne encoding a pri- 
muidta! domain Curtently l-novwi mil' an otic isparfyi ^toieases ate - Vr-ttebi atu aa^trto pr-psin^ A md C talso l-novwi 
as gastnesm^ - Vertebrate ch> mosin uenntn ) in\ clved in digestion and used for mal.ing cheese - Vertebtate iyscscmai 

JO cfithopstn*. D (EC 3 4 2o t) and E (EC 3 4 23 34| - FVhmmalkin ronin <EC ^ 4 23 1Ew v\hOi,e funotton i& to jenorate 
angntenstn ! from angiotenstnogen in the plasma - Fungal pmt^ases such as aspergillop^psin A (EC ? 4 23 18) 
candtdapepsm (EC 3.4.23.24). mucoropepsin (EC 3.4.^3.23) (mucor rennin). endothiapepsin (EC 3.4.23.2^;). polypo- 
roptepsm (EC 3 4 23 ^"i and rhisopuspepsin ;E> 3 4 23 2n - >east sact, ha to pepsin <EC o 4 23 25) ( protein ^s<= A> 
igene P[:P4) PEP4 is implicated m postttansl^itional legulatton ot vacuolar hvdiolases ■ \?ds\ barnet pepsin t£C 
3 4 13 35) 1 gene BAP1) a protease that cleaves alpha-factor and thus acts as an antayonistotthe mating pheromone 
■ Fission yeast s^ai ^hich ts involved in degiadino. 01 prooessmy the mating pheiomones Most rettoviruses and some 
( lantMtuses siK'ti as b : io'n : u'iru&o& enoodt; tor anaspaityl f.tolt;asji : ' AhKh is an hcniodnn^f of a chain cf about 95 io 
12^ amino acids In most retiovirtisf-s the piot-^se is en< od-^d as a segment of a polvptotein o'hich is < leaveo ounnc? 
the maturation piocess of the ^irus It is generally part ot the pol polyprotein and mote rately of the gaypohjOrotem 

■to Conservation of the sequence- around the two aspartates of eukat^tK asp^rt^l ptoteasts wd ^iound the single ^ctKe 
site of the vital pfota : ises alktv\s us to de,<t;!op a single sijnatuit; f. atUtn for both groups of ( rcUaso 
Consensus pattern: [LIVMFGACHHVMTADN]-{LIVFSA]-D-[ST)-G4STAVHSTAPOeNQ]- x-[L!VMFSTNCj-x-[LIVIV!P- 
GTAj jD is the active site lesiduej ■ [ 1 j Foltmann B Assays Btoohem t7 d2--84« 198 1) [ 2j Davies D P ^nnu Rev 
Biophys. Chem. 19:189-215{1990).f 3] Rao J. K M.. Erickson J.W . WlodawerA. Biochemistry 30:4683-4671 (1991 ).f 4] 

4S Rawiings N.D.. Barrett A.J. Meth. Enzymoi. 248:105-120(19951 

[1449] 533 irvtl Peveise transenptase (RNA-dependent DNA polymeiase) 

Areveisetianscnptasegeneisusualiv indicative of a mobile element stnh asa ietrotiansposon oi retioviius Pe^eise 
transcriptases occui in <j y<jnet\ ?f nrjbi!^ elements, in .iuding tetmtiansposons tetnn'iruses gtoup II intions bu^teual 
msDNAs hepadnaviruses and caulimoviruses Number of members 1233 
so [1450] [1] Medline: 91006031 . Origin and evolution ot retroeiements based upon their reverse transcriptase sequenc- 
es Mong\ Eiici-tuih 'IH, EvMBO l^X) 9 3363 3362 
[1451] 534 i S-AdoMet synt) S-adenosylmethionme synthetase signatuies 

S-adenosyimethionine synthetase (EC 2.5.1.6) is the enzyme that catalyzes theformatton of S-adenosylmethionine 
(AdoMet) from methionine and ATP [1]. AdoMet is an important methyl donor for transmethylation and is also the 
55 prop«, lammo donui m poU amine biosynthesis in bactena thefr- is a single tsufuim of ^doMet synthut ise (gene meiKt 
there are too m budding yeast igenes SAM1 and SAM 2) and in mammals while in slants thete is generally a muitigene 
family The sequence ot AdoMet «vntheta«e is highly conserved throughout iso?ym<rS and speues Two siynattn* 1 pat- 
terns h : u'« been solected tor this type of en^ymt; tht first is a he^apeptidt; v\hk'h seems to to involved in ATP-bindmg 
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the secono is an almost perfectly conserved glycine-nch nonapepttde 

Consensus pattern G-6-G-0-* .K3-xi\ihGTFYHj-tot-<querKeb Known to belong k this -i^t. d^te<_t«?d nv the pattern 
Consensus pattern: G-[GA]-G-[ASC]-F-S-x-K-[DE] 

[ t] Hmikawo S Susuga f Shimiztt Y Oz-js-j H Tsiil-..na K t B to! Chem 265 nC8 3- 13680(1900 1 

s [14523 5S5. S1 RNA binding domain 

TheS1 domain ocatts in d wide r.jnge of RNAC omment associated ptoteins it is stiuctuiall\ similaiC <. mment tocoid 
sho^k ptuttitn *hi<-h binds nuU> j tc acids romment The S1 domain has an OB-fo!d sttu^ture 
[1] Bycroft M, Hubbard TJ. Proctor M, Freund SM. Murzin AG: Ceil 1997:88:235-242. 
[1453] 58C SAICaR synthetase signatuies 

10 Phosprionto^ylanNnoimidazola-.}U>xinocaibo> amide synl base- (EC 6 3 2 6i (SAICARsvnfhetase t >'jljlyz-is the sev- 
enth step in the de novo purine biosynthetic pathway: the ATP-dependent conversion of S'-phosphoribosyi-S-aminoiiTi- 
idazole-4-t.aiboxvlit =t-"id and aspartic acid to SaICAR [!j in bacteria <gen<= puiC) fungi (gene ACEli and giants 
SAICAR synthetase is a monofunotional protein in higher vertebrates it is the N teitninrtl ooin.jin of .j bffuncfK na! en- 
zv'me that also jutjlyze phosphonbosyiaminoimiaazole jutbovylase {AiPCj actMtv Two conserved regions in the 
centra! section of this enzyne nave been selected as signature patterns for SAICAR synthetase 
Consensus pattern: [LiVMF3(2S-P-[L!VM]-E-x-[LIVMHL!VMCA3-R-x(3)-[TA]-G-S- 
Consensus pattern: [UVMHL!VMA]-D-x^K-[LIVMFYl-€-F-G 
{ i]Za!kmH Dion J E Piog Nucleic Acid Pes Moi Eiol 42 25008~t W92) 
[1454] f <J7 tSv. P) Ext! annular proteins oCP'Tp*- 1 'Ag5/PR- t/Sc7 signatuies 

J- A vanity of -idtaailiulat piotem^ ftom e-ukaryote s have bt^n found to be avolulionary related - Rodent spetnnx ating 
glycoptotein (SCFi alio Known as -jcidi: epididymal gjvcopiotetn iAEG) This protein is thought to be involved in 
soetm maturation [1] it is a piotem of about '120 residues and prohaoK contains eight disulfide bonds - Mammalian 
t<=stis-sptx ifk pi<.t<sin Tpv-1 [^] Tp<-1 is highlj lelatKi to ScP's - Mammalian glioma pathogeni-sis-i elated piotem 
(GliPRt - Lcatd he iothermint; a fruin that blocks ry?<nodin« receptors - \> j nom allergen 5 [Aj5) from vuspid vsasps 

2$ and venom allergen 3 tAg3) from fire ants. These proteins are potent allergens and are the main cause of allergic 
reactions to stings troin insects of the hymenoptera family [3], AgS/3 are proteins of about 200 residues and contain 
tout disulfide bonds - Plani pathogenesis piote-in^ of the- PR-1 family [4] The^e proteins, are synthesized during paih- 
ogen infection or other stms^-reiatad responses They ate ptoieins of ibout 130 to 140 te sidims ana probably contain 
thiee disulfide honos - Proteins So 7 and Sc14 from the basidomycete fungus Scneophyllum commune These e*tra~ 

JO cellular f.ioit;ins ate loosed) associated with Iruitbody hvphal wails [oj Sc 14 ate ptoteins of about 180 lesidu-is and 
probably :ontain tv,o disulfide boms -Ancyffstoma sect eted protein fmmdoo hf ok' wim -Teasthvpctheti^jl proteins 
yjL078c. YJLO70C and YKR013w.The exact function ofthese proteins is not yet known. Two conserved regions located 
tn then C-teimin^l half hav<= Leen seated as sign^tuie pattems. The second sign^tuie contains a cybtt-me Mhich is 
known to be involved m a disulfide bond in Ag5. 

35 Consensus pattern: [GDER]-H-[FYWHj-T-Q-.[LlVMK2VW--x{2HSTlvi] 

Consensus pattern: [LiVMFYHJ-[LiVMFYj-x-C-(NQRHS]~Y-x-[PARH]-x-[GL3-N-[LiVMFYWDN] [C is involved in a di- 
sulfide bond] 

[ t]Mizii!.iN hasahaiaM Mol <~ *— ii Endoctmoi 89 1^2. 19^2) [ 2j hasahara M GutknechtJ BrnvK ^puit U 
Goodfellov, PNf Genomics 5 527-S34( i98C} [ 3j Lu G Viliaiba M Coscia M R Hoffman D P King TP J Immunol 
40 150:2823-2830(1 993). [4] Dixon D.C.. Cutt J.R.. Klessig D.F. EMBO J. 10:1317-1324(1991).[ 5] Schuren F.H.J. . As- 
g.='t!6doitir S A hoihe E M Scheet JMJ Wess-ils J G H J Gar. Miciotiol 1 VJ 208o-2u00tJ993} 
f1455j 588. SET domain 

St.]' domains appeal to be protein-piotein mtPiactionComment domains It has been demonstrated that SET Jo 
m^iniOommetit mediate inter.jctiotis Aith .j famil\ of ptoteins th.jtCotntnent display simiiantv vuth du.ji-specificitv 
■>s phosph it isesCommtnt (dsPTP?<sesi [2] 

[1]TnpoulasN, LaJeunesse D. Giidea J. Shearn A: Genetics 1996:143:913-928. [2]CuiX. DeVivo I. SlanyR. Miyamoto 
a, hiestein P C lean Mi. Nat Genet 1998 18 , J -31 33; 
[1456] 589 Sic homology 3 fSH3i oomuin profile 

Tne Src homology 3 (3H3; domain is a small protein domain of about 00 ammo-acid residues first identified as a 
■io conserved sequent in the non-c^ta!\tic part <.f several cytopiabmi^ protein Krosme hna^eb (e g St." AL! LcMJ 1 ] 
Since then, it has been found in a great vanetv of other intracellular or membrane-associated proteins [2,3.4. oj. The 
SH3 domain has a rharartenstic fold which consists of five or si> beta-stiands artangeo us hvo tightly packed anti- 
parallei beta sheets. The linker regions may contain short helices [63-The function of the SH3 domain is not well un- 
derstood The tunent opinion is that they mediate assembly of specific piotem completes via binding to proline run 
55 peptides P] in aiwitii il SH3 domains arc- found as singir- copies in a giv.in piotmn but thr- te is a sianifif mt number of 
ptotetn sMth teo SH3 oomains and a few with 3 or 4 copies So fat SH 3 domains ha\e been identified m tne following 
proteins: - Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular 
in the bto ^bi Bkt usk and ZAP70 families of kinases - Mammalian pho^ hatidylmosifol-isf acifK' ptiosphulipasa c - 
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gamma- 1 and -1 - Mammalian phosphatidyl inositol 3-kmase tegulatoiy pS5 snbunit - Mammalian Pas GTPase- 
activating piotein (GAP) - Adaptoi protein*, mediating binding of nu^ntnt nucleotide Exchange factors to growth factor 
it-ceptois vertebra t-i GRB2 ua^nohabditis -ilegans s^m-^ arid Enosophiia DRh ah cfwhn.fi havi* two SH3 dc mains 
-Mammalian Vov oncoptotein a guanine nucleotide e<rnange tactoi of the CDC24 family - Some guanine-nucleotide 

s re-leasing factors of the CDC23 family yeast CDC23 yeast SCD25 fission voast ste6 - MAGUK proteins These 
ptoteins consist ot at least thiee typ e* ot domains, one or moie copies of the DHR dot n. tin a cSH 3 dorruin and a C- 
terminal guanyial^ kin is« domain Members of this family ait- Dfosophila iethaii 1 klises laigf-1 tumor suppruSi.cE 
protein sgene D!g1 !. mammalian tight junction protein ZO-1 . vertebrate erythrocyte membrane protein p55, Caenorhab- 
dttis elegans piotem tin 2 lat proton C and mammalian synaptic proteins &APJ-0'P8D-9'i CHAPSyN-110/PSD 

to 93 bAP97/CLG1 and SAP1U2 - Miso^ilanous f.ioleins interacting v\ith rt-ibiafe ttct;} tor protein t>rosine kinases 
mammalian cytoplasmic piormn Nek t? copies oncoprotein Crk (2 copies) - Chicken Sn. substrata p80'S" protein 
(cortactin) and the similar human hemopoietic lineage cell specific protein Hs1 , - Mammalian dihvdroundme-sensitiye 
L-type calcium channel beta {regulatory} subunit including the related human myasthenic syndrome antigen B (MSY B>. 
- Mammalian neutrophil cytosolic activators of NADPH oxidase; p47 (NCF-1), p67 (NCF-21 and a potential homolog 

»5 from Caenorhabditis elegans 1 80303 7) NCF-1 and -2 have two copies of the SH3 domain, while B0303.7 has four. - 
Some myosin heavy chains from amoebae, slime molds and yeast (gene MY 03), - Vertebrate and Orosophila spectrin 
and fodnn .tlph j-cham ■ Hum.tn amphiphysm ■ ^e.jst actm-omding prttetn ABPt ■ Yeast .totin binding ptotein SLA 1 
(3 eopiesj - Yeast protein BEM I and the fission yeast homolog scd2 tot ra!3u2 copies) - Yeast BEMi -binding nroteins 
BOI2 fBEBi ) and BOB I (BOI1 1 ■ YVast fusion piotein FUS1 ■ Yeast piotem RbVlfT ■ YVast piotein SSU91 -Yeast 

20 hypothetical proteins YARD 14c (1 copy), YFR024c (1 copy), YHL002w (1 copy). YHRU16C (1 copy). YJL020C (1 copyl, 
YHP114* 1 2 copies) and the fission yeast homolog SpAC12C2 0?c - Cannot ha bditis elegans hypotnetiral piotetns 
F42H10 3 The piofile developed to detect SHJ domains is oased on a sttuctural alignment consisting ot C gap-fiee 
blocks and 4 iinkei ttgions totaling t-2 match position*. 

[ 1] Mav^r B .1 Hamaguchi M Hanafusa H Nature 3^2 272-2^5(1 «8S) [ 2] Musacehio A Gibson T b>hio V P Sar- 
25 aste M. FE8S Lett. 307:55-81{1992).[ 3] Pawson T. Schlessinger J. Curr. Biol. 3:434-442(1 993),[ 4] Mayer B.J., Bal- 
timore D, Trends Celi Biol. 3:8-1 3(1 993).[ 5] Pawson T. Nature 373:573-580(1 99S).[ 8] Kuriyan J.. Cowbum D. Curr. 
Opm Stiuet Bk.I 3 878-^37! 19«3) [ 7] Morton C J Campbell I D Curt Bioi 4 6 ! 5-61 7 t 1904) 
[1457] 590 S^rin^ hydfOAvmuthyiti msferase pyEido^al-phosphate attachment sit*- tSFIMT) 
Seiine nydrovymeth>ltiansferase(EC2 1 2 1 USHMT) [ I j catalyses the transfei of th*> hydioxyinethyl gioup of swin^ 
30 k fefrahydrotolafe to kim o 10-meihylenetett : ihydioioiate arid glynn^ In viwtebutes if ■■Kists in .Kytoplasmic : md a 
mitochondna! form whereas only one fotm is found in prohaiyotes Serine hydroxymethylttansfeiase is a pyndo*al- 
phosphate containing enzyme. The pyndoxai-P group is attached to a lysine residue around which the sequence is 
highly conserved in all forms of the enzyme. 

Consensus partem: [DEH3^[L!VMFY|~x-{STMV]-[GST]-[ST3(2)-H-K-[ST3^[LF3-x-G^[PACHRQ3^[GSA|-{GA] [K is the py- 
35 ndoxa!-P attachment silej 

j 1] Usha R . Savithn H S . Rao N A Biochim Biophys ^cta 1204 75-830 &94). 
[14SS] 591. SiS domain 

SIS i3ug3r ISomeiase) domains aie found in m3ny phosphosugar isomerases 3nd phosphosug3r binding proteins 
[1] Tepiyakov A, Obmolo\,a G. Badet-Demsot MA. Badet 8 Polikarpov I. Structure 1998.6 1047-1055 

■to [1459] 592 tSKI) Shikimate kinase signature 

Shikimate kinase (EC 2_7__l__71) catalyzes the fifth step in the biosynthesis fiom chortsmate otthe aromatic ammo acids 
(the shikimate pathway) inbacteria (gene aroK or aroL). plants and in fungi (where it is part of a multifunctional enzyme 
which catalyzes five consecutive steps in this pathway). Shikimate kinase is a small protein of about 200 residues. A 
conserved tegion that contains a run of three glvcmes has been selected as. a signature pattern 

4S Consensus pattern: j:KR3~x(2V£-xf3HLIVMF3-xf8 t 12HL!VMF](2HSA3-x-G{3)- x-(UVMF). Proteins belonging to this 
family also contain a copy of the ATP/GTP- binding motif 'A' t'P-loop). 
[1460] 593 SNAP-25 family 

[1461] SNAP-25 (synaptosome-associated protein 25 kDa) proteins aie components of SNARE complexes Mem- 
bers of this family contain a cluster of cysteine residues that can be palmitoylated for membrane attachment [2] 
io [1462] [IJBrennwald P, Kearns B. Champion K Keran&n S, Bankaitis V Novick P. Ceii 199-1 7& 245-258 {2} Rismger 
C, Blomqvist AG. Lundell I. I.ambertsson A. Nas.sel D Pieiibone VA, ft rod in I.. Larhammai O. J Bio! Chem 1903.268 
24408-24414. 

[14633 5&A SNF2 and others N -terminal domain 

[1464] This domain is found in proleins imolved in a vanefy of processes including transcription regulation (eg , 
ss SNF2. STH1 brahrna MOT 1 1 DNA repair (e g ERCC& RAD16. RAD5) ON \ ^combination (e g , RAD54\ and 
chromatin unwinding (e.g., ISWh as well as a variety of other proteins with little functional information {e.g., lodestar, 
ETL1 ). 

[1485] 595. Staphylococcal nuclease homologues (Snase) 
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Present in all three domains of cellular life. Four copies in the transcriptional coactivator pi 00 These, however, appear 
to lack the active site residues of Staphylococcal nuclease. Positions 14 (Asp-21). 34 (Arg-35). 39 (Asp-40). 42 (Glu- 
43) andComment 110 (Arg-87) [SNase numbering in parentheses] are thought to be involved in substrate-binding and 
catalysis. 

s [1] Panting CP; Protein Sci 1997;6:459-483. [2] Callebaut I, Mornon JP; Biochem J 1997;321:125-132. 
[1466] SPRYdomainA 

SPRY Domain is named from SPIa and the Ryanodine Receptor Domain of unknown function Distant nomologists 
are domains in Comment: butyrophilin/marenostnn/pyrin homologues. 
]1] Ponting C, Schulte J, Bork P. Trends Biochem Sci 199/';22'1S3-194 

10 [1467] 597 (SQS PSY) Sgualene and phytoene synthases signatures 

Two different polyisoprene synthases have been shown [1.2,3] to shaft a number of regions of sequence similarities 1 
- Squalene synthase (EC 2.5 1 21 ) ffarnesyi-diphosphate famesyltransferase) <SQS}. which catalyzes the conversion 
of two molecules, of famesyl diphosphate (FPP) into squalene it is the first committed step in the cholesterol biosynthetic 
pathway. The reaction carried out by SQS is catalyzed in two separate steps: the first is a head-to-head condensation 

»5 of the too molecules of FPP to form presqualene diphosphate: this intermediate is then rearranged in a NADP-de- 
pendent reduction, to form squalene, SQS is found in eukaryotes In yeast it is encoded by the ERG9 gene, in mammals 
by the PDH'1 gene SOS seems to he membrane-hound. ■ Phytoene synthase (EC 2.5. 1.-} {PSY). which catalyzes 
the conversion of two molecules of geranyigeranyi diphosphate (GGPP; into phytoene. It is the second step in the 
biosynthesis of carotenoids from isopentenyl diphosphate. The reaction carried out by FSY is catalyzed in two separate 

so steps: the first is a head-to-head condensation of the two molecules of GGPP to form prephytoene diphosphate, this 
intermediate is then rearranged to form phytoene PSY is found in all organisms that synthesize carotenoids plants 
and photosynthetic bacteria as well as some non- photosynthetic bacteria and fungi, in bacteria PSY is encoded by 
the gene crtB. In plants PSY is localized in the chlotoplast. As it can be seen ftom the description above, both SQS 
and PSY share a number of functional similarities which are also reflected at the level of their primary structure. In 

25 particular three well conserved regions are shared bySQS and PSY; they could be involved in substrate binding and/ 
or the catalytic mechanism. Signature patterns have been developed for the second and third conserved regions; they 
are localized in the central part of these enzymes. 

Consensus pattern: Y.[CSAM3.x{2)-r^SG3-A-[GSAHL!VATH!V]-G-x{2HtMSC3-x(2Ht!V] 
Consensus pattern: [LIVM3-G~x(3)-Q~x(2,3)-hHF]-x-R-D-[LlVMFY]-x{2HDE]-x(4,7)-R-x-[FY]-x-P- 
30 [ 1] Summers C . Karst P., Charles A. D Gene 1 36:1 8^-192(1993 !.[ 2] Robinson G.W., Tsay Y.H , Kienzle B K... Srnith- 
Monroy C A , Bishop R W Mol Cell Biol. 13.2706-2727(1993) [ 3) Reenter S . Hugueney P., Bouvier F . Camara B., 
Kuntz M. Biochem. Siophys. Res. Commun. 196:1414-1421(1993). 
[1468] 598 SRP54-type proteins GTP -binding domain signature 

The signal recognition particle {SRP) is an oligomers complex that mediates targeting and insertion of the signal 

35 sequence of exported proteins into the membrane of the endoplasmic reticulum SRP consists of a 7S RNA and six 
protein subunits. One of these subuntts. the 54 Kd protein (SRP54), is a GTP-binding protein that interacts with the 
signal sequence when it emerges from the nbosorne The N-terrnina! 300 residues of SRP54 include the GTP-binding 
site (G-domain) and 3re evolutionary related to similar domains in other proteins which are listed below [1]. - Escherichia 
coll and Bacillus subtil is ffh protein (P48). a protein which seems to be the prokaryotic counterpart of SRP54. Ffh is 

40 associated with a 4 5S RNA in the prokaryotic SRP complex. - Signal recognition particle receptor alpha subunit (dock- 
ing protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting 
of nascent secretory proteins to the endoplasmic reticulum membrane. The G -domain is located at the C -terminal 
extremity of the protein. ■ Bacterial fts if protein, a protein which is believed to piay a similar role to that of the docking 
protein in eukaryotes. The G-domain is located at the C-terminal extremity of the protein. ■■ The pilA protein from Neis- 

45 sena gonorrhoeae which seems to be the hornoiog of ftsY. - A protein from the arehaebacteria Suifoiobus solfataricus 
This protein is also believed to be a docking protein The G-dotnam is also at the C- terminus - Bacterial flagellar 
biosynthesis protein flhF The best conserved regions in those domains are the sequence motifs that are part of the 
GTP-binding site, but as those regions are not specific to these proteins, they were not used as a signature pattern. 
Instead, a conserved region located at the C-terminal end of the domain was selected. 

so Consensus pattern: P-[LIVM]-x-[FYL]-[LIVMAT]-[GS}-x-[GS]-[EQ3-x{4HL!VMF3 [ 1] Althoff S., Selinger D., Wise J.A. 
Nucleic Acids Res. 22: 1933-1 94 7(1 694). 

[1469] 599. (STphosphatase) Serine/threonine specific protein phosphatases signature 

Serine/threonine specific protein phosphatases (EC 3.1.3.18) (PP) [1 ,2,3) are enzymes that catalyze the removal of a 
phosphate group attached to a serine or evolutionary related - Protein phosphstsse-1 <;PP1) is an ennyme of broad 
55 specificity It is inhibited by two thermostable proteins, inhibitor- 1 and -2 In mammals, there are two closely related 
isoforms of PP-r PP-ialpha and PP-1beta. produced by alternative splicing of the same gene in Emertcella nidulans, 
PP-1 (gene bimG) plays an important rote in mitosis control by reversing the action of the nimA kinase. In yeast, PP- 
1 (gene SIT4) is involved in dephosphoryiating the large subunit of RNA polymerase II - Protein phosphatase-2A 
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(PP2A) is also an enzyme of broad specificity PP2A ts a trimeric enzyme that consist of a core composed of a catalytic 
subunit associated with a 65 Kd regulatory subunit and a third variable subunit In mammals, there are two closely 
related iso forms of the catalytic subunit of PP2A: PP2A-alpha and PP2A-beta, encoded by separate genes - Protein 
phosphatase-28 (PP2B or calcineurin), a calcium-dependent enzyme whose activity is stimulated by calmodulin, it is 

s composed of two subunits: the catalytic A-subunit and the calcium-binding B-suburtit The specificity of PP2B is re- 
stricted. In addition to the above-mentioned enzymes, some additional senne/threoninespec.ifk: protein phosphatases 
have* been characterized and are listed below - Mammalian phosphatase-X (PP-X). and Dfosophila phosphatase- V 
(PP-V) which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3. which is similar to PP2A. but 
with different enzymatic properties ■■ Drosophila phosphatase-Y (PP-Y). and yeast phosphatases Z1 and 7.2 (genes 

to PPZ1 and PPZ2) which are closely related but yet distinct from PP1 . - Drosophila retinal degeneration protein C igene 
rdgC) : a calcium-binding phosphatase required to prevent light-induced retinal degeneration. - Phages Lambda and 
Phi-80 ORF-221 which have been shown to have phosphatase activity and are related to mammalian PP's. The best 
conserved regions m these proteins is a perfectly conserved pentapeptide that can be used as a signature pattern. 
Consensus pattern; [LIVMj-R-G-N-H-E- 

*5 [ 1] Cohen P. Annu. Rev. Biochem. 58:453-508(1 989). [23 Cohen P., Cohen P.T.W J. Bio!. Chem 264:21435-21438 
(1989).[ 3] Cohen P.T.W.. Brewis N.D., Hughes V.. Mann D.J. FEBS Lett. 268:355-359(1990). 
[1470] 600. Translation initiation factor SUI1 signature 

in budding yeast (Saccharomyces cerevisiae), SUM is a translation initiation factor that functions in concert with e!F- 
2 and the initiator tRNA-Met in directing the nbosome to the proper start site of translation [1]. SUil is a protein of 108 
so residues Close hornoiogs of SUI1 have been found [2] in mammals, insects and plants. SUil is also evolutionary 
related to hypothetical proteins from Escherichia coli <yciH}, Haemophilus influenzae (HI1225) and Methanococcus 
vannieiii. A conserved region in the C-termmai section has been selected as a signature pattern. 
Consensus pattern: [LIVM3-[EQ]-[L!VM}-Q-G-[DEM]-[KHQ3-[KRV3 

[ 1] Won H Donahue TF Moi r>i| bioi 12 248-200, 1Q&2) [ 2] Fluids C A Ad ims M D Biochem Eiuphys Pes 

25 Commun 108 283-291' 1994) 

[1471] t-01 tS I dehvdiatase) oennethieonine dehydratases pyndo\al phosphate attachment site 
Serine and Ihieonine dehydiatases f ! /] are funtlicnailv and shud'traily related pvndo<al-phosphate dependent en- 
j^mes - L-s.^in^ dehydratase {E<~ 4 2 1 13) and D-senne dehydratase iEC 4 2 1 14t catalyse the dehydration of L- 
senne ^respectively D-setinei into ammonia and pytu«te - Threonine dehydratase (EC 4 2 1 t6nTE>Huatalyzesthe 

JO oeh> dilation oi threonine into alpha-keiobutarale and ammonia In Escherichia coli and other miac organisms Iwc 
classes of TDH are known to exist. One is involved in the biosynthesis of isoleucme. the other in hvdroxamino acid 
catabolism Threonine synthase (EC 4.2.89.s£) is also a pyridoxal-phosphate enzyme, it catalyzes the transformation 
of homosenne-phosphate into threonine It hat. been shown [33 that threonine s\nthase isdistantK related to the set me/ 
threonine oehydiatases in all these enzymes the pvikknai-fhospfiate ytoup is aftaiheo to a l\sine residue 1'he 
sequence around this residue is sufficiently conserved to alk»v\ rhu dc- 1 nation of a pattern sp> j cifn. tu sarins; threonine 
dehydratases and threonine synthases 

Consensus pattern: EDESH]-xi'4.5)-[STVG]-x-[AS3-[FYi]-K-[DLIFSA3-[RVMFHGA3-[LIVMGA] [The K is the pyrtdoxal-P 
attachment site] 

[iJOgawaH Gomi T Kontsht K Date T Naakashima H Nose h Matsuda t Peiaino C Pitot H C Funoka M 
40 J. Biol. Chem. 284:15818-15823{1989).[ 23 Datta P.. Goss T.J.. Omnaas J.R.. Patil R.V. Proc. Natl. Acad. Sci. U.S.A. 
84 198 / 1 [ >3 Parsol C EM BO J i> o0 1 3-3019{ \Wb) { 4j Gratx wski K Hofmeistef A E M bucktl W trends 

Biochem. Sci 18:297-300(1993). 

[1472] C vsteine synthase/cystathionine beta svnthase P phosphate attachment site 

Cysteine synthase iCSase) is the p\ndo>al phosphate dependent enzyme responsible 1 1] for the formation of cysteine 
■*s fromO-ac> j M-sc-nnc- and hydruoun sulfide v\ith thu con^omi! ml release or acetic aud In oacten i suchas Eschern. hi;< 
ccii, W-r forms otthe enzyme are i nov\n i genes cysK and cjsMi in plants there are also tvo forms one located in the 
cytoplasm and the othenn chioroplasts.Cvstathiomne beta-synthase [23 catalyzes the first irreversiblestep in homo- 
cysteine tiansulfutation the conjugation ft homo : i ( steine andserme forming cystathionine LiheCsase it is a pynd<H<jl- 
phospnate depenoent enz\ me The two t\ pes of enzymes are evolutionary related The pyrtoov-al-pnosphategroup of 
■io CSases has been shovm to be attached to a lysine residue a huh is located in the N-teimma! section offline enzymes 
the sequence aaund this residue is highly -xmerved .tnd can be used .ts a siynatute pattern to detect this class of 
enzymes. 

Consensus pattern: K-x-E-x(3)-[PA]-[STAGC]-x-S-[iVAP3-K-x-R-x-ESTAG3-x(2)-[LlVM] {The 2nd K is the pyridoxai-P 
attachment site 

ss [l]S?<ituh hurosavsaM Murakoshi I FEB? Lett 32£ 111-114(1 f *93) [ 2] iroup M Bridie h OhuraT Tah?<ia 
T Rooet M D Rosenoerg L £ hrausJP J Biol Chem 20" 11455- it 461 1 19321 
[1473] 602. S locus glyeop 

b-lotus gly^'opictein fanni> In Brassicaci^at; self-incompatible plants tiave a &i : 'lf/non-j« If Comment re-xgnilion sys- 
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tem. Thfs is sporophytically controlled by Comment: multiple alleles at a single locus (S). S-locus glycoproteins, Com- 
ment, as wet! as S-receptor kinases, are in linkage with the S-alleles [1] Number of members: 128 
[1] Evolutionary aspects of the S-r elated genes of the Brassiea -self-incompatibility system synonymous and norisyn- 
onymous base substitutions Hinata f\ Watanabe U. Yamakawa 8. Satta Y. Isogai A, Genetics 1995.140:1099-1104 
s [2] Polymorphism of the- S-locus glycoprotein gene (SLGj and the S-locus related gene (SLR1; in Raphanus sativus 
L. and self-incompatible ornamental plants in the Brassicaceae. Sakamoto K. Kusaba M, Nishio T; Mo! Gen Genet 
1998:258:397-403. 

[1474] 603. (sdh cytt Succinate dehydrogenase cytochrome b subuntt signatures 

Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components, a membrane-extrinsic com- 
to portent composed of an FAD-binding flavoprotein and an iron-sulfur protein, and a hydrophobic component composed 
of a cytochrome B and a membrane anchor protein The cytochrome b component- is a mono heme transmembrane 
protein [1,2,3] belonging to a family that groups: - Cytochrome b-556 from bacterial SDH (gene sdhC'r - Cytochrome 
b580 from the mammalian mitochondrial SDH complex. - Cytochrome b580 subunit encoded in the mitochondrial ge- 
nome of some algae and in the plant Marchantia polymorphs - Cytochrome b from yeast mitochondrial SDH complex 
»5 (gene SDH3orCYB3). - Protein cyt-1 from Caenorhabditis.These cytochromes are proteins of about 130 residues that 
comprise threetransmembrane regions. There are two conserved histidines which may beinvoived in binding the heme 
group. Two signature patterns have been developed that include these histidme residues. 
Consensus pattern: R~P-[LiVMT3-x(3HLiVM3~x(8HLiVMVVPK]-x(4)-S-x{2)-H-R-x-[ST3 [H could be a heme iigand] 
Consensus pattern: H-x(3.HGA]-[L!VMT}-R-[HFHLiVMF]-x-[FYWM]-D-x--[GVA3 [H could be a heme iigand] 
so [ 1] Yu L , Wei Y -Y , Usui S , Yu C -A. J Biol. Chern 267:24508-2451 5(1992).[ 2] Abraham P R , Mulder A . Van't Riet 
J., Raue H.A. Moi. Gen. Genet. 242:708-718(1994).[ 

3] Leblanc C„ Boyen C, Richard O., Bonnard G, Grienenberger J.tVL. Kioareg B. J. Moi, Biol, 250:484-495(1995), 
[1475] 604. Seel family 

[1] The Seel family: a novel family of proteins involved in synaptic, transmission and general secretion Halachmi N 
25 Lev Z; J Neurochem 1996;68:889-897, 
Number of members: 40 

[1476] 605 Protein secE/see6'l -gamma signature 

In bacteria, the secE protein plays a role in protein export; it is one of the components - with secY and secA - of the 
preproteintransiocase in eukaryotes, the evolutionary related protein sec61 -gamma playsa roie in protein translocation 

30 through the endoplasmic reticulum: it is part of a tumeric complex that also consist of sec81-alpha and beta [1j Both 
secE and sec61 -gamma are small proteins of about 80 to 90 ammo acids that contain a single transmembrane region 
at their C-terminal extremity (Escherichia colisecE is an exception, in that it possess an extra N -terminal segment of 
60residues that contains two additional transmembrane domainsi.The sequence of secE/sec61 -gamma is not extreme- 
ly well conserved, however it is possible to derive a signature pattern centered on a conserved proline located 10 

35 residues before the beginning of the transmembrane domain 

Consensus pattern: [LiVMFY]-x(2HDENQGA>x(4HUVMFTA]-x-[K^^ 
[LIVFGAST] 

[ 1] Hartmann E. t Sommer T , Prehn S., Goerlich D. t Jentsch S., RapoportT.A. Nature 387:654-657(1994). 
[1477] 606 11-S plant seed storage proteins signature 
40 Plant seed storage proteins, whose principal function appears to be the major nitrogen source for the developing plant, 
can be classified, on the basis of their structure, into different families. 11-S are non-glycosylated proteins which form 
hexameric structures [1.2] Each of the subunits in the hexamer is itself composed of an acidic and a basic chain 
derived from a single precursor and linked by a disulfide bond. This structure is shown in the following representation. 



■>s xxxxxxxkxxxCkxxkxxxxxxxkxxxxxxxkxxNGxCkxxxxxxxkxxxxxxxkxxxxxx < -Acidic-subunit >< 

Basic-subunit > About-480-to-500-residues >'C" conserved cysteine involved in a di- 
sulfide bond.'*': position of the pattern Proteins that belong to the 11-S family are: pea and broad bean legumins, rape 
crucifertn. rice glutslins. cotton beta-globulins, soybean glycinins, pumpkin 11-S globulin, oat globulin, sunflower heli- 
anthinin G3, etc. The region that includes the conserved cleavage site between the acidic and basic subunits t'Asn- 

so Gly) and a proximal cysteine residue which is involved in the interchain disulfide bond have been used as a signature 
pattern for this family of proteins. 

Consensus pattern: N-G-x-[DE](2)-x-[L!VMF]-C-[ST3-x{11 l 12HPAG]-D [C is involved in a disulfide bond 
[ 1)Hayashi M„ Mori H,, Nishimura M„ Akazawal, Hara-Nishimura !. Eur. J, Biochem 172:627-632(1 988 },[ 2]Sbotwe!l 
M A., Afonso C . Davies E , Chesnut R.S.. Larkins B.A Plant Physiol 8?'6S8-704(1S88). 
ss [1478] 807. 7S seed storage protein 

[1479] 7S globulin is one of the mam storage proteins of most angiosperms and gymnosperms. The 7S storage 
proteins are homotrimers. 
Number of members: 67 
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[1480] pJThethree-dimenstonalstructuie of canavalm ftomjack bean tCana\ alia ensiformisi KoTP Ng JO McPhet- 
son A; Plant Physio! 1993:101:729-744. 

[1481] 60y Ah\ art : ik : '-semi?jldehyd< : ' deh>diGqenai J < : ' signatur-i 

Aspartate-swniaid^hv le oehydtogenose ( ASDuatalvzes the secono ^tep tn the t ommon fc msynthetir pathway leaoing 
^ from Asp io diamtnopimelate and L\s to Met and to Thr ihc NADP dependent reductive dephosohorylation of L- 
asp.jit\l phoiph.jfe to I. -rtspart.jfe-::.emialdehyde In ba<.ten.t and fungi, ASDis a protein of about 40 Kd (340 to 370 
fi^^tdutjs) \-h<*st- suqueno^ is nut uYtiem^U w> j !l eon^^ed [1] A (.onssjiv^d spirit r-^sidu*- has huf-n implicated is 
important for the catalytic activity [2] The region of conservation around the active site residue is too smaii to be used 
as signature pattern. Another more conserved region, located in the last third of the sequence, and which contains 
10 both a const-iv^d cysteine ?ss wll : is an histidirn* has been us-id instead 
Consensus pattern: [LiVMHSADN3-x{2VC-x.R.[LIVM3-x(4HGSC}-H-j:STA 

[ 1] Ban! C Riehaud C. Fourni E,. Baranton G., Saint Girons I, J, Gen. Microbiol. 138:47-53(1 992). [2] Karsten WE.. 
UoLtRE Biochim [-?ioph>s &cta 1 1 :! 1 2^4- ,?38M i 

[1482] N-aretyl-gamma-glutamyl-phosphate reouitose active sit-? N-acetyl-g3miTt,j-g!utamyl-pnosph.jt^ reductase 
»5 (EC 1.2.138) sAGPR) [1 ,2] is the enzyme that catalyzes the third step in the biosynthesis of argmme from glutamate. 

the NADP-dependent reduction ot N-acetyi-5-glutamvl phosphate into N-acety!glutamate 5-seimaidehyde.in bacteria 

it is a nrtonokinciiona! protein of 35 fo 38 Kd (gene argC) while in fungi it is part of a Afunctional mitochondrial enzyme 

(gene ARG5 6 argil ctrarg-Uwnicn contains a r-Mfctminaiacetylglutarnate kinase i EC 2 7 2 8) domain and a C-termtna! 

AGPR domain, in the Escherichia coll ensvme. a cysteine has been shown to be implicated in the catalytic activity, the 
2- region arcund ttiis ttsidue is w-iil conserved : md (.an b<* ut^d as a signature pattern 

Consensus pattern [LIVM3-iGSAj-x-P-G-C-[F -,']-( AVP3-T-[GA]-x ( 3V[GTAC1-[LtVM3-<.-P [C is tne active site resiouej 

[ 1] ludovice M Martin J F Canachas P Liras P J Bacterid 174 4600-4613(19321(23 Gessert SF Kim JH 

Margang F.E.. Weiss R.L. J. Biol. Cherrt. 269:8189-8203(1994). 

[1483] 609. Sialyltransferase family. 
25 Number of members: 18 

[1484] 610 SpoU I'RNA Methylase family 

This family of proteins probably use S-AdoMet Number of members 58 

[1]SpoU protein ot Escherichia coll bf-iongsto a tit* fainih of putative rRKM m-Hhyias^s b uonin E\ Rudd b E Eviuoif-io 
Acids Res W93 21 5C i 3-5S 1 9 [2] The spoU gene of escherichta coil the fourth gene of the spoT operon is essential 
JO ki tkNA iGmiyi 2 ' meihyFtKinsfeiase aeiiMty Peisson Bf lag^r G Gustafsson < Nudeic Acids Res 199 1 2o 
4093-4097. 

1 1485j 611 . Stathmin family signatures 

Stathmin [1] (from the Greek ■stathmos'which means relay), is an ubiquitous intracellular protein, present in a variety 
of phosphoiyl.tt<rd foims and which serves at a tel.jy kt dK<rti.e second m?sseng.<ri pathwavs Its e<pr<rSi.ion and 

A* phosphorylation am ff-gulatf-d thtoughout development and in respond to eyttactjilular signals Emulating cell prolif- 
eration, differentiation and function. Stathmin is a highly conserved protein of 149 ammo acid residues. Structurally, it 
consists of an N-t-irmmal domain of about fesidu^ followed t> a 73 r^idut; ?jlph?j-ru : 'lio : il domain consisting of a 
her. tad repeat coileo coil strtictui^ ano a C-teimin^i domain ot 25 residues Ptotein SCG10 is a neuion-spenfic mem- 
brane-associated protein that accumulates in the growth cones of de\ eloping neutons itisntghK, bimilann itsseouence 

40 to stathmin, but differs in that it contains an additional M-terminal hydrophobic segment of 32 residues which is probably 
onsibli : ' ki its interaction v\ith membifmes >tnof.us piottin XB3 is also evoluiionary related to shithmm arid at^o 
contains an additional N-termma! hydrophobic domain [2j. A conserved decapeptide which ends with the first three 
residues of the coiled coil domain and a second pattern that corresponds to part of the centra! region of the coiled coil 
have been selected as signatures for proteins of the stathmin family. 

4S Consensus pattern: P-[KRQHKRK2HDE3-x-S-L-|EG]-E- 
Consensus pattern: A-E-K-R-E-H-E-[KR]-E- 

(1|8obelA Tiends Biochem Sci 16 3' ■ i ■ ^•O'in t ( 23 Maucuei A , Moiean J Mechalt M Sobel A J tioi Ohem 
268:16420-16429(1993'). 

[1486] 612 SUAS'i ciO/yrdC family signature The following uncharacterized proteins have been shown (1 ] to share 
so regions ot similarities: - feast protein SUA5. - Escherichia coll hypothetical protein yciO and HI11 9a. the corresponding 
Haemophilus influenzae protein, - Escherichia coll hypothetical protein yrdC and HI0658. fhe corresponding Haemo- 
philus influenzae protein. - Bacillus subtilis hypothetical protein vvvlC. - Mycobacterium leprae hypothetical protein in 
rfe-bemh intergenic region. - Methanococcus jannaschii hypothetical protein MJ0062. These are proteins of from 20 
to 46 Kd v\hk'h contain : i numbei otconst-iv^d regions inttu : 'H N-leiminfjlsectiori They can be picked up in tht database 
ss by the following pattern. 

[1487] ConsenbUb pattern [Lk/MTA]< 2)-[U\ MFrCHPG]-T-[DE3-j;STA]-v-j;FrHGA3-[LlVM]-[Gc3- 
[1488] f 1] Pair., -h A Rndd I* F Robi^on K Unt ubltsh<rd ob^eiv^tionfe 
[1489] 613. Sucrose synthase 
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Sucrose synthases catalyse the synthesis of sucrose fiom UDP-glucose and fructose This family includes the bulk of 
the sueiose synthase protein Howevei the earbovyl terminal region of the sucrose synthases belongs to the glycos-yl 
transferase family Glycos^Jrans-M. 
[1490] 614. Sulfotransferase proteins 
s Number of members: 59 

[1491] 615 Synaptophysin / synaptoporin signature 

Synaptophysm and synaptoponn [1] art? structurally related proteins, found in the membrane of synaptic vesicles, which 
may function as ionic or solute channels. These two glycoproteins seem to span the membrane four times. Both their 
N- and C -termini sequences seem to be cytoplasmicaiiy located As a signature pattern for this family of proteins a 

to highly conserved region located in the beginning of the first intravascular loopiustafterthe firsi transmembrane domain 
has been selected This region contains a cysteine residue thai may be involved in a disulfide bond 
Consensus pattern: L-S-V-[DE]-C-x-N-K-T [C may be involved in a disulfide bond [ 1] Knaus P.. Marqueze-Pouey 8., 
Scherer H., Bete H Neuron 5 453-462(199(0 
[1492] 616. Syndecans signature 

»5 Syndecans (1,2) (from the greek syndein: to bind together) are a family of transmembrane heparan sulfate proteogly- 
cans which are implicated in the binding of extracellular matrix components and growth factors. Syndecans bind a 
vanety of molecules «ia then heparan sulfate chains .and can act as receptors or as co-receptors Struotutaiiy, these 
proteins consist of four sepaiate domains ai A signal sequence b) An extracellular domain tectodomam) of variable 
length and whose sequence is not evolutionary conserved in the various forms of syndecans. The ectodomain contains 

so the sites of attachment of the heparan sulfate glyeosaminogiycan side chains, c) A transmembrane tegiorr d) A highly 
conserved cytoplasmic domain of about 30 to 3? lesidues which could interact with cytoskeletal proteins The piotetns 
known to belong to this family are - Syndecan 1 - Syndecan 2 or fibroglycan ~ Syndecan 3 or neuroglycan or N~ 
syndecan - Syndecan 4 oi amphiglycan or tyudocan - Drosophila syndecan - Caenorhabditis elegans probable syn- 
decan (F57C7 3i The signature pattern that fus been developed for syndecans starts with the last residue of the 

25 transmembrane region and includes the first 10 residues of the cytoplasmic domain. This region, which contains four 
basic residues, could act as a stop transfer site. 
Consensus pattern [FY]-R-pMHKR]--K<2}-D-E-G-S-Y 

[ 1] Bernfield M . kokemesi R . Kato M Hmkes M T , Spring J , Gallo R L . Lose E J Annu Rev Cell Biol S 365-393 
\ 1 992 } [ 2] Dav id G FASEB J 7 1023-1030' 1993) 

30 [1493] 61 7. Syntaxm / epimorphin family signature 

The following proteins have been shown to be evolutionary related [1.2 31, - Epimorphin for synta<in 2}, a mammalian 
mesenchymal protein which plays an essential roie in epithelial morphogenesis. - Syntaxm 1 A (also known as antigen 
HPC-1) and syntaxm 1B which are synaptic proteins which may be involved in docking of synaptic vesicles at presy- 
naptic active zones. ■■ Synta <in 3 ■ Syntaxm 4. which is potentially involved in docking of synaptic vesicles at piesynaptic 

35 active zones - Syntaxm 5. which mediates endopijsmic reticulum to goigi transport - Synlaxin 6 which is involved in 
intracellular vesicle trafficking ■■ Syntaxm '• ■■ Yeast PEP v> ;or VPS6) which is required for the transport of proteases 
to the vacuole - Yeast SED5 which is required for the fusion of transport vesicles with the Golgi complex - /east SSQ1 
and 3S02 which aie tequired for vesicle fusion *rththe plasma membone - Yeast v'AM3, which is tequiredfor >/acuolai 
assembly - Arabidopsis thaitana protein H NOLLE which may be involved in cytokinesis - Caenorhabditis elegans 

40 hypothetical proteins F35G8 4. F48F7 2, F55A1 1 2 and TO 1 Bit 3 The above proteins share the following charaetei- 
istics: a size ranging fromSO Kd to 40 Kd, a C-terminai extremity which is highly hydrophobic, and isprobably involved 
in anchoring the protein to the membrane; a central, well conserved region, which seems to be in a coiled-coil confor- 
mation The pattern specific for this family is based on the most conserved, region of the coiled coil domain. 
Consensus pattern: [RQ}-x(3HUVMA}-x(2HLIVM}-[ESH}-x{2HLIW«T>x-[DEVMHLlVMl-x(2)-[LIVWHFS}-x(2)- 

4S [LIVM]-x(3HUVT>x(2VQ~ [^^^ 

[ 1] Bennett M.K., Garcia-Arraras J.E.. Elferink L.A., Peterson K.. Fleming A.M., Hazuka CD., Scheller R.H. Ceil 74: 
863-S73i 1993} [ 2} Spring J , Kato M , Berntield M Trends Biochem Set 18 124-125! 1993) [ 3] Pelharn H R B Cell 
73:425^26[1993)_. 
[14943 o!S. Sm protein 

so The U 1 . U2. LWU6. and U5 small nuclear nbonucleoprotem particles (snRNPs! involved m pre-mRN A splicing contain 
seven Sm piotetns (B/B r . D1 02. D3, E. F and G) in common, which assemble around the Sm site present in foui of 
the majoi spliceosomai small nuclear RNAs These proteins contain a common sequence motif in two segments. Sm1 
and Sm2. separated by a short variable linker. 

[1495] [1] Hermann H Fabnzio P. Raker VA Foulaki K Hornig H. Brahrns H. Luhrmann R EMBO J 1995,14- 
ss 2076-2086 [2j Karnbjch C. vValke S, Young R, A.'is JM, de ij Forteiie E. Raker VA Luhrmann R Li J Nagai K. Cell 
1999:96:375-387. 
[14963 519. Skp1 family 

[14973 [1] Stebbins CE. Kaelin WG Jr. Pavletich NP. Science 1999 284 455-461 
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[1498] 620. Protein secY signatures 

The eubactenal secY protein [1] plays an important role in protein export. It interacts with the signal sequences of 
secretory proteins as well as with two other component's of the protein translocation system: seoA arid seoE SecY is 
an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains ten transmembrane 

s segments Such a structure probablyconfers to secY a 'translocator" function, providing a channel for periplasmic and 
outer-membrane precursor proteins Homologs of secY are found in archaebaot.eria [2], SecY is also encoded in the 
chloropiasi genome of some algae [3] where it could be involved in a prokaryotic-like protein export system across the 
two membranes of the chloroplast endoplasmic reticulum (CER) which is present in chromophyte andcryptophyte algae. 
Two signature patterns have been developed for secY proteins. The first corresponds to the second transmembrane 

to region, which is the most conserved section of these proteins. The second spans the C-terminal part of the fourth 
transmembrane region, a short intracellular loop, and the N-termina! part of the fifth transmembrane region. 
Consensus pattern: [GSTHL!VMF](2)-x-[L!VM]-G4L!VM3-x-P4L!VMFY]f2)-x-[AS^[GSTQ]4LIVMFAT3(3VQ-[LiVMFA3 
(2) 

Consensus pattern: [LIVMFYW]f2)-x~[DE]~x~[L!VMFH^ 
*5 { 1]ltoK. Moi. Microbiol. 8:2423-2428(1 092). [ 2] Auer J., SpickerG., BoeckA. Biochimie 73:683-888(1 991 ).[ 3j Douglas 
S.E. FEBS Lett. 298:93-96(1992). 

[1499] 621. (Seed protein) Small hydrophiiie plant seed proteins signature. The following small hydrophilic plant seed 
proteins are structurally related: - Arabidopsis thaliana proteins GEA1 and GEA6 - Cotton late embryogenesis abun- 
dant (LF.A) protein D--1&. ■■ Carrot EMB-1 protein ■ Barley LEA proteins B19 1A, B19 IB, B19 3 and B1&.4. - Maize iate 

so embryogenesis abundant protein EmbH34. - Radish late seed maturation protein p8B6.-Rice embryonic abundant 
protein Emp1 - Sunflower 10 Kd late embryogenesis abundant protein (DS10) - Wheat Em proteins These proteins 
contains from 83 to 153 amino acid residues and may play a role{1,2] in equipping the seed for survival, maintaining 
a minimal level of hydration m the dry organism and preventing the denaturation of cytoplasmic components. They 
may also play a role during imbibition by controlling water uptake. As a signature pattern, the best conserved region 

25 in the sequence of these proteins has been developed, it is a glycine-rich nonapeptide located in the N-terminal section. - 
[1500] Consensus pattern G-[E;.Q|-T-V-V-P-G-GT- 

[1501] [ 1] Dure L. lil. Crouch M, Harada J.. Ho T-H. D. , Mundy J.. Quatrano R : Thomas T . Sung Z..R Plant Moi. 
Biol 12 475-486(1 989 1 [ 2] G mbiai F RaynM M Hull G Hue sti*, G M Grdief F Are nas C Pag^s M Deisem, M 
Moi. Gen. Genet. 238:409-418(1993). 

30 [1502] 62z. Serine carboxypeptidases. active sites 

All known carhovypeptidases uie either metals ;aibo>ypeptidases or serinecarboxypeptidases The catalytic a:tivity 
of the serine carboxypeptidases like tnat of the trypsin family serine proteases is provided oy a charge relay system 
invoking an a^partic acid residue hvdrogen-bond^d to a histidme which i*. itstlt h/dK pjen-botKted k a senn<= [1] 
Proteins: known to te senne crtrbo*ypeptid.jses are ■ Barley and wheat serine cttbo o/peptidases 1 11 ano 111 (23 
Yeast carbons, peptidase Y d^CYi (gene PRCtt a vacuol ir pioiease invoked in degrading small p«ptid> j s - ^a^t 
KEM protease, invoiced in kiiiertoxin and alpha-fartot piecursoi processing f-'ission yeast sva2 a piobabie caiho^ 
ypeptidase involved in degrading or processing mating pheromones [3]. - Penicillium janthineiium carboxypeptidase 
S1 [4] - Aspeigullus niger "atbfvvpeptiaase pepf - -isf ergullus satoi catboxypeptioase < pd^ - Vertefc t ate £ tote ctiv<- 
protem /catheosin A [3] a lysosomal protein wnicn is notonK a camovy peptidase but also essential for the acti\ibj of 

■to both b^ta-galactosidase and neutammidase - Mat,., into utelkgenk" catLo^i^ptid^st (VCP> [03 - Naeglena fovTwi 
mi iile-nce-i elated protein N1314 [/j - Yeast hyj ethereal pioteiri YBk139w - Ca^norhabditis elegans hypothetical pro- 
teins C08H9 1 F13D12 6 F32A E i 3 F41C3 5 and K10E2 2 Tnis family aisiMncludes - Sorghum (sj-hynrov/mandelo- 
nitnle lyase (hydroxymtnle lyase) (HNL) [83. an enzyme involved m plant cyanogenesis. The sequences surrounding 
the active site serine and hiittdme residues are highly -xmerved in .jii these set me carbox\ peptidases 

45 Concensus pattern [LIVM3-^[GTA]-E-S-V[AG3-[is] [£ is thy aeftvt; site residue] 

Consensus pattern: [LiVF3-x(2)-[LiVSTA3-x-[!VPST]-x-[GSDNQLHSAGV3-[SG]-H-x-[iVAQ]-P-x(3HPSA] [H is the ac- 
tive site res (due 3 

[1] Liao Dl Remington S I J Bioi Chem 285 6'S2y-6 c >31( 1^001 [ 2] ->orensen bB S^.endsen 1 Brt:dd=im K 
Carlsberg Res Commun 34 193~202 t 1989j [ c) Imai t Yamamoto M Moi Cell Biol 1 2 1 227- 1 834! 1 0*1 1 [ 4) Sv- 

so endstn I Hotmann T Endnzzi J Remington I Biedd^mK FEBS Lett C 33 39-4 3< 1998 } [ Z] G^ljart N J Moticau 
H Wiiiemsen R , Gillem.jns hi tfonten [■ I d'&^zo A J t?iol C hem 266 t4/"64-14;62(19^1 ) [ o\ Cho WL Deitsdi 
KVV Raii-hel-iS Ptoc Nati Acad Sn USA 88 10821-10824(1991 1 [ ]HuWN Kopachil- W Bano P N Infect 
Immun. 60:24 18-2424(1 992 ).[ SJWajantH.. Mundry K.W.. Pfiteenmaier K. Plant Mot. Biol, 26:?35-746(1894).[ 9]Rawl- 
ings N.D.. Barrett A J. Meth, Enzymol. 244:19-61(1994 ).[E1] 

ss [1503] 623 Stipins sicmature Serpms (3ERine Proteinase INhibitorst [12 3 4] ire a qtutip of slruauraliy mlated 
ptotetns They aie high molecuiar\ u eight (400 to 500 amino acids^ e^tracellulai it reversible serine piotease tnhibitois 
^'ith a v,<r!l defined t.tuKtutai-fi motional ^haiacten->tio a teaotiv.; tegionthat art* as a 'bait' toi an appropriate lenne 
( tote-ase This region is found inlhe'*" -tenninal p : nt of these ptoteiris Pioleinsvvt)n.h aie knov\n to belong to the stipin 
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family are listed belo\» (references ate only provided for recently determined sequences J - Aloha-1 protease innibitor 
(a!pha-1~antitrypsin, contrapsmi. - Alpha-1-antichymotrypsin, - Antithrambin ill. - Alpha-2-antiplasmin. - Heparin co- 
f=iofor !l - Complement CI inhibitor - Plasminogen actuator mhibttots 1 (PAl-l ) and 2 (PAI-21 - Glia derived ne<m 
(GDN) (Protease nex 1 1) -Protein C ihtbte -Pathepatervtes SPI-1 SP1-2 and SP1-C ihibte b - Hi nan sqnam<n b 

s cell carcinoma antigen tSCCA; which may act in the modulation of the host immune rcsoonso against tumor colls - A 
lepidopteun ptote.jse inhibitor ■ Leukocyte el.jstase inhititoi which in -xnttastto other svrfins is .tn intiacelluLtt 
protein. - Neuroserpm [5]. a neuronal inhibitor of plasminogen activators and plasmtn. - Cowpox virus crrnA (6). an 
inhibitor of the thiol protease inter!eukm-1B converting enzyme (ICEl CrmA is the only serpin known to inhibit a non- 
senne proteinase. - Some orthopoxviruses probable protease inhibitors, which may be involved in the regulation of the 

to blocd cloning cascade and oi ot Hie complement cascade in the mammalian host On the b : isis cf strong sequence 
similarities, a number of proteins with no known inhibitory activity are said to belong to this family: - Birds ovalbumin 
and thf t elated genes \ and \ piot^ins - Anrjiot^nainogten th<= pie^uisoi of thf wgntenbin active peptide - Bailey 
piotem the maju enoospeim albumin C t rti-x stei oid binding globulin (CBd ■ T hyroone-binding globulin (7BG) 
- Sheep uterine milk protein ^UTMP^ond pig utetofeinn-assoc fated protein (UFAP) - H^p47 an endoplasmic teticulum 

»5 heat-shock protein that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic pathway 
[7). - Maspin. which seems to tunction as a tumor supressor[5]. - Pigment epithelium-derived factor precursor iPEOF). 
a f totem with a strong neuttophic a>.tMt\ (8j Eip45 .tn estrogen-iegulated piotem fiom Xenopui. j9j A signature 
pattern has been de\ eloped fot this family of proteins centered on a ^eil conserved Pio-Phe seouence y.htch is found 
ten to fifteen lestdue? on the <. terminal side of the leacti^ bond 

so [1504] Consensus pattern [LIVMFV]-x-[LIV'MFYAC]-[DNO]-[RKHQSj-[PST]-F-[LIV'MF )']-[LIVMFVC]-4-[L!y MF^H] 
[IjCatrHIP Travis. J Tiends Bmchem Sn 10 20-2411986) [ 2] CatrHI R Pemberton P & , Boswell D R CHdCpnng 
Harbor Symp. Quant. Biol. 52.527-535(1987) [ 3) Huber R.. Carreli R.W Biochemistr) '_8 895i-8u66(1tSDM 4] 
Remofd-O'DonneelE. FEBS Lett. 315:1G5-108{1993).[ 5] OsterwalderT., Contartese J., Stoeckli E.T.. Kuhn T.B., Son- 
deregger P. EMBO J. 15:2944-2953i 1996) [ 6] Komiyama T Ray C.A : Pickup D.J H^vatd A D Thombertv N A 

25 Peterson E.P. Salvesen G. J. Biol. Chem, 269; 1 933 1 -1 9337( 1 984) { 7} Clarke E . Sandwa! 8.D, Biochlm. Biophys. 
Acta 1129:-M6-248(1S92).[ 8f Zou 7. , Anisowic? A„ Neveu M.. Rafidi K.. Sheng S , ?ag* K Hendru U J Seftor E 
Thof A. Science 263 526-529(1 994 ).[ 9] Steele F.R : Chader G.J . Johnson L.V.. Tombian-Tmk J Pnx Nad ^ad 
Sci. U.S.A. 90: 1526-1 530(1 993).(10] Holland L.J.. Suksang C, Wall A, A.. Roberts L.R.. Moser D.R.. Bhattacharya A. 
J. Biol. Chem. 267:7053-7059(1 992}. 

30 [1505] 624 Sigma-54 interaction domain signatures and profile 

Some bacterial regulatory proteins activate the expression of genes from promoters terogmsed by coie RNA pHymw- 
ase associated with the alternative sigma-54 factor. These have a consented domain of aoout 230 residues ms'oUed 
in the ATP-dependent [1 ,2] interaction with sigma-54. This domain has been found in the j. loteins lifted bdow - acoP 
from Alcaligenes eutrophus, an activator of the acetom cats holism operon ac.o.XABC algO fiom Pseudomonai j^ru 

35 ginosa, an actuator of alginate biosynihefic gene algD - dctD from Rhizobium an activator of di tA tht C4-dirarbon!ate 
transport protein. ■■ dhaR from Citrobacter freundii. a regulator of the dha operon tor glycerol utiiiration fhiA fiom 
Escherichia coil, an activator of fhe fotrnate dehydrogenase H and hydrogenase 111 stiuctural genes. - tlbD from c au- 
lobactercrescentus : 3n activator of flagellar genes - hoxAfrom Alcaligenes eutrophus, an activator of thi* hvdiog^nase 
operon ~ hrpS from Pseudomonas syringae. an activator of hprD as well as other hrp loci involved in plant pathogenicity 

40 - hupRl from Rhodobactei capsulafus, an activator of the [NiFe] hydrogenase genes hupSL. -hydGfam Escheikiiia 
col! and Salmonella typhimunum, an activator of the hydrogsnase activity. - levR from Bacillus subttiis, which requlates 
the expression of the levanase operon (levDEFG and sac-C) - nifA (as well as anfA and vnfA) from vonmis oocteua 
an activator of the nif nitrogen-fixing operon. ■ ntrC, from various bacteria, an activator of nitrogen as?imilatory genes 
such as that for glufamine synthetase (glnA) or of the nif operon. - pgtA from Salmonella typhinwium, the activator of 

■*s the inducible phospho- glyceraie transport system. - piiR from Pseudomonas aeruginosa, an activator of ptlin gent 
transcription ~ rocR from Bacillus subttiis. an activator of genes for arginine utilization - tyrR from Escherichia coli 
involved in the transcriptional regulation of aromatic ammo-acid biosynthesis and transport. -wtsA, from Erwtnia stew- 
artii, an activator of plant pathogenicity gene wt&B - :<ylR ftom Pseudomonas putida, fhe activator cf the fol plasnnd 
xylene catabolism operon xylCAB and of xylS - Escherichia coli hypothetical protein yfh.A. - Escherichia coli hj ooihet- 

so teal protein yhgB. About half of these proteins (algB, dcdT flbD, hoxA. hupR1 f hydG, ntrC f pgtA and pilR) belong to 
signal transduction two-component systems (3[ and poss=es=s a domain that can be phosphorylated by a sensoi-kin.jse 
protein in their N- terminal section Almost all of these proteins possess a hehx-turn-helix DMA-binding domain m then 
C-terminaf section. The domain which interacts with the sigma-54 factor has an ATPase activity. This may be required 
to promote a conformational change necessary for theinteraction [4], The domain contains an atypical ATP-bindmg 

55 motif A (P-loop) as well as a form of motif B. The two ATP-binding motifs are located in the N-terminai section of the 
domain: signature patterns have been developed for both motifs Other regions of the domain are aiso conserved. One 
of them, located in the C-terminal section, has been selected as a third signature pattern 
Consensus pattern: [LtVMFY](3}-x-G-[DEQ]-[STE]-G-[STAV}-G-K->(2)-[L!VMFY] 
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Consensus pattern: [GS]-x-[LiVMF]-><{2^A-[DNEQASHHGNEK]-G-[STiM3-ELiVMFY](3)-[DE3^EKHl-lVM] 
Consensus pattern: [FYW]-P-[GS}-N-[LiVM]-R-[EQ3-L-x-[NHAT3 

[ 1] Morrett E Segovia L J Bacterid 175 6067-S074i 1993) [ 2] Austin S , Kundrot C Dixon R Nucleic Acids. Res 
19 228 1-2287(1 991 ) \ 3} Albright L M . Huala E Ausube! FM Annu Rev Genet 23.311-3360989} [ 4] Austin S.. 
s DhfonR EMBOJ 112219-2228(1992; 

[1506] &25 Sigrna-70 factors family signatures 

Sigma factors [1 j are bacterial transcription initiation factors that promote: the Jttaehment of the core RNA polymerase 
to specific initiation sites and arethen released. They alter the specificity of promoter recognition Most bacteria express 
a multiplicity of sigma factors Two of these factors, sigma-70 fgene rpoD). generally known as the major or primary 

to sigma factor, and stgrna-54 {gene rpoN or nUA) direct the transcription of a wide variety of genes The other srgrna 
factors, known as alternative sigtru factors are required for the transcription of specific subsets of genes With regard 
to sequence similarity, sigma factors can be grouped into two classes the sigma-54 and sigma-70 families The sigma- 
70 family includes m addition to the primary sigma factor, a wide vatiety of sigma factors some of which are listed 
below - Bacillus sigma factors involved in the control of sporuiation-specific genes sigma-E (stgE or spoliGB) sigma- 

»5 F (sigF or spollAC). sigma-G (sigG or spolllG}. sigma-H t'sigH or spoOC) and sigma-K (sigK or spolVCB/spolilC). - 
Escherichia coii and related bacteria sigma-32 (gene rpoH or htpR) involved in the expression of heat shock genes. - 
Eschetiohia coli and related bacteria sigma-27 (gene fliAi invoked in the e<pression of the flagellin gene ■ Escherichia 
coii sigma-S (gene rpoS or katFl which seems to be involved in the evpresston of genes required for piotection against 
external stresses. ■ Pyococcus >-anthus sigma-B fsigB) which is essential for the late-stage differentiation of that 

so badena Alignments of the sigma-70 family permit the idenlifieation of four regions of high conservation [2 3] Each of 
these four regions can in turn be subdivided into a number of sub-iegions Signature patterns based on the two best- 
conserved suo-regions ha\e been rte\ eloped The titbt pattern corresponds to sub-region 2 2 tne evact function of this 
sub-region is not known although it could be involved in the binding of the sigma factor to the core RNA polymerase. 
The second pattern corresponds losnb-regiorU 2Minh s^ems tub irbui i DN-i -binding heiiv-futn-hfcJr/ motif irk olvud 

25 in binding the conserved -35region of promoters recognised by the major sigma factors. The second pattern starts one 
residue before the N-terminal extremity of the HTH region and ends six residues after its C-termmal extremity. 
Consensus pattern: [DE]-{L!ViVIF3(2HHEQS3-x-G-x-[L!VMFA3-G-L-fLIV!ViFYE]-x-{GSAM3-[LIVMAP] 
Consensus pattern: [STN^f2HOEQHL!VM3-[G^S3~xM)-[LIVMFHPSTG^f3HLTVMA3-x-(NQRHL!VMAHEQH3--x 
(3HLiVMFW]-X(2HLIVM3 

30 [ i] Heimann J.D.. Chamberlm M.J. Annu. Rev. Biochsm. 57: 839-8 72(1 988). [ 2] Gribskov M.. Burgess R.R. Nucleic 
^:icte Res i4 6-45-6763i 19<i6> [ 3) Lonettn M A Grifcskov M Gross C A J Bacterid 174 3 £ }43-3349 i ,1 992) [41, 
Lonetto M.A.. Brown K.L, Rudd K E . Buttner M.J. Proc. Natl. Acad. Sci. USA. 91:7573-7577(1994). 
[1507] €26 signal carboxj-l-krminal domain 430 meirtbeis 
[1508] (527 Signal peptidases I signatures 

-s* Signal pt;ptioass;s (SPass;s) [1] (also known as leader peptidase s) mmove th> j signal peptides from secr^tor^ pioteins 
In piokaryotes thiee types ot opases art- known tvpe I (gene iepB) winch is responsible for the processing ot the 
majority of exported pre-proterns: type I! (gene Ispl which only process lipoproteins, and a third type involved in the 
processing of pit suonnits SPas-* I is 3n integral membone pmt<-in that is anchored in the < ykplasrnic membrane by 
one tin B. subtilisi or two (in E. colt) N-terminal transmembrane domains with the mam part of the protein protuding in 

■to the p^nplasmic spact- Two lesidu^s have L <;en sho^n [2 3] k be tssentt^l toi the catalytk activity of SPase 1 a serine 
and an lysine SPase 1 is evolutionary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
igenes IMP1 and IMP2) which catalyze the rwrry al ot signal peptides requited for the targeting of proteins from tne 
mitochondrial matrix, across the inner membrane, into the inter-membrane space [4]. in eukaryotes the removal of 
signal peptides is effected by an oligompnceiwmjtK. complex toinpoied of d\ least five subuntts the signal peptid.tse 

■*s cumpluY (3PC1 The SPC is lucatto in the e ndoplasmic r-Hroulnm m> j rnbran> j Two components of mammalian PPC 
the 1 8 1 rt (SPC 1 8} and the 2 1 Kd tSPC21 1 subunrts as well as the yeast SEC 11 subunit ha\ e been sho* n [C] to snare 
regions ot sequence similarity with prokaryotic SPases I and yeast IMP1/1MP2 Three signature patterns for these 
pioteins ha^.eb^endtivtiloptid Thtfirsl signature eontainsthe putative : iotiyc sitt; senne the second signature contains 
the putative active site ksine which is not conserved in tne SPC subunits and the third signature corresoonds to a 

■io conserved i^gion ot unknown tologtcal significance which is k .^kd in th*> c-termmal section of all these piot^in^ 
Consensus pattern fGc5|-vc5-M . (PS[-jAT [-jl.F| (S is an active site residue] 

Consensus pattern K-P-[LIVMSTA1(2)-G-Y-[PGj-G-[DEj-<-[LIVM3->-[LiVMFVHK is an ..iJwe site residue] 
Consensus pattern: [LtVMFYWj(2)-x(21-G-D-[NH3-x{3KSND3-x(2KSG) 

( 1 j Dalbe> R F von Herjne r : 'f.ends £-3io<.hem ^ci 1 " 474-4/8(19&?i [ :j Sung M , Dalbe> R F J Biol C hem 16' 
ss 13154-13159(19921.3 3] Black M.T.J. Bacteriol. 175:4957-4961f19931.[ 4] Nunnari J.. Fox T.D.. Water P. Science 262: 
t3u7-2004t W93) [ C] van Dill J M de Jong A Vehmaanpera .1 V enema G Bion S EMbO J 1 i 28 t9-2326t 19921 
[6] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:19-61{1994).[E1] 
[1509] 62y isodciil »"opp^rQru' siiptiioxidt; disinut : ise signature's 
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Coppei/Zmc superoxide dbmutabe (SODC) [ t] is one of tne three forms ot an ereryme that catalyzes the dismutation 
ofsupeioxide ladicals S' >D> binds on<= atom t-ach <. f^in^. and ^.oppei Vanousfums otS' >D> art-known ^cytoplasmic 
form in eukaryotes. an additional chloroplast form in plants, an extracellular form in some eukaryotes. arid a penplasmtc 
fftm in prokarvotes Th^ metal binding sites are conserved in all the known SODC sequences |2] Two signature 
s patterns have been derived for this family of enzymes the first one contains two histidino residues that bind the copper 
atom the second one islocated m the 0 terminal section ot SODC .tnd -xntains a cysteine *\hkh is invoked in a 
disulfide bond Consr- nsus pattern [GA]-[!MFAT]-H-[LlVF]-H-^2V[GP]-[^Dr-.]-v-[STAGDE] [The Uu H's art copper 
ligandsj 

Consensus pattern r : [GNHSGi\|-G-*-N-*-[SGA) C v(2HIVj jC is ii^ol^ed in a disulfide bond] 
10 { 1] Bannister J y Banntsfef WH Roiitio G CRu »^itt R.*v Biochem 22 1 1 M54f 19S7> [2] Smith M W DoolrfO R 
F. J. Mot. Evoi. 34:175-184(1992). 

[1510] €29 tsodte^ Mangan^s^ =ind iron supeioxidt dismut^SfeS binn=rtuie 

Manganese superoxide dismutase (SODM) [1] is one of the three forms of an enzyme that catalyzes the dismutatton 
of supeio^ne r,.ntcals The toui ligands of the manganese atom are cons^n ed m all th^ kncMn SODM sequen^s 
*s Tnese metal ligands are also conserved tn the related iron form ofsuperovide dismutases (2 3] A short consewea 
region which includes two of the four ligands: an aspartate and a histidine has been selected as a signature. 
Consensus pattern D-k-VVC- -H-[ST Aj-jf Yjt.2) |D .tnd H ate m.tng.tnese/iron ligandsj 

[ i] Eannibtet J V bannister WH Potilio G CPC Crtt Re^ Biochem 12 1 i i-1C4{ iDe~i [ 1} Parget M W Bla^eCC 
F FEBS Lett 229 ;>77- ?d2( 1 9t«3 1 [ >} Smith M W Doolittie R F J Mo! r?yol M rf- 1d4( Nfi/i 
20 [1511] 630. Spectrin repeat 

[1512] Sp^trin rnpeutsaie found in several pi •jteins invoked imyt> skeletal structute These include spe:tnn alphu- 
actmm and dysttophin The sequence repeat us>ed in this family is taken ttomthe structural repeat in leference [2] The 
spectrin repeat forms a three helix bundle. The second heltx is interrupted by proline in some sequences. 
Number of members: 898 

25 [1] Acttn-binding proteins. 1 : Spectrin super family. Hartwig JH: Protein Profile 1995:2:732-732. [2] Crystal struc- 

ture of the repetitive segments of spectrin Van^ Winograd F., Vie! A Gionin T Hainan SC, Pianton L> Science 199J 
262 2027-2030 

[1513] 6"1 (^nbtilase) StEtptomvofs ^nbtilisin-typs; inhibitors signature 

Bacteria ot the Stieptomyces family produce a family of proteinase mhibitors[ I] charactenzeo by then strong activity 
30 tcwaid subtilisin fhi : 7 are ocil^ctivily known as SSI's Stieptomyces Sufctilistn Inhibits SomebSl'salsc tnhibttirypsin 
or chymotrypsm In then muture secreteo foim SSI's aieptoteins of about 110 tesnues with tv,o ronserved disulfide 
bonds. + + + + HI! 

oxox<unox> nnox' u^x<H<iC*x<vnoxoCnox<*"" u """C v.onst-Tved cistern*: invoked in i di- 
sulfide bond."*': active site residue.'*': position of the pattern 
35 Consensus pattern C-a-F-^2 3VG-^H-P-y(4.-A-C-[ATD3-y-L [The two C's are involved in a disulfide bond] 
[ 1] Taguchi S.. Kojima S.. Terabe M , Miura K.-L Momose H. Eur J, Biochem. 220:911-918(1894), 
[1514] 63z. Sugar transport proteins signatures 

In mammal wn cells the uptake of glucose is mediated oy a family ot loselv related transport pi oteins which at called 
the glucose transporters, p 2 c) At least se\<en of these tiansportets aie currently knoy.n to e^ist (tn Human they are 

40 encoded by the GLUT1 to GLUT7 genes). These integral membrane proteins are predicted to comprise twelve mem- 
bian-i spanning dc mains I he jkKose tiansporteii, 6 ho a s^UiWKe siiruhntu^ [4 h\ v\ith a number ui othti sugar ci 
metabolite transport piotetns listed fcelosv (tefetences are only fto^ided for recently oetetmineo sequences^ - Es- 
cherkhta con arahinose-pioton symport laraK ) t.^henchia coh galactose-pioton s>inport f^jalPj ■ i-jscheiichia <^oh 
and Klebsiella pneumoniae titiate proton svmport (also kno*\n as citrate utilcation deteiminjnti i^ene cit} l-:s- 

45 chenrhi?< cull alpha-ketooiutafatf pernie?<st; (gf ne kgtPi - Escherichia cull ptu!in> j /bt;tains; transporter (cmn- 1 proPt[o] 
- Escherichia coli xylose-proton symport (xyiE). - Zymomonas mobiits glucose facilitated diffusion protein (gene glf i - 
/east high and low affinity glucose transport pmteins tgenes SNf-\ J . HaT l to HAT 14i veast galactose transporter 
igen-i GALzt - \e : ist mattos-i pt;inie : ises iqern^ MALTT arid MALST) - V-iast myu-ifKsitol tiansf. orteii, series ITRl 
and ITRCi - Yeast carhov>lic acid transporter protein homolog JEN1 - Yeast inorganic phosphate transporter t gene 

so PH'>y.ii - Kluywromyv.es lactis lactost- peime^^e (gene LAC - Nturospora ciassa .,uinat<; tians|.ortei (gen<= Qa- 
y} and [imencella nidulans quin^te permease (gene qutD) ■ Chloiella he^ose corner (g^ne HUP1) Ar^ibidopsis 
thaliano glu:ose transportet (gene 5!Plj - Spinach sue ms^ transporter - Leishmania d^no^ani transporters D1 and 
D»£. - Leishmania ennetti) probable transport protein (LIP). - Yeast hypothetical proteins y8R241c. YCR98c and 
/FL040w. - Caenorhabditis elegans hypothetical protein ZK&37.1 - Escherichia coli hypothetical proteins yabE, vdjE 

ss and yhjE. - Haemophilus influenzae hypothetical proteins HI0281 and HI0418. - Bacillus subtilis hypothetical proteins 
}^.<oC andyvdF It has been buggesteo [4] that these transport proteins have e\ oKedfromtheduplication of an ancestial 
protein with six transmembrane regions, this hypothesis is based on the conservation ot two G-R-[KR] motifs. The first 
one is located bttwetn trn» stcund and thud tiansm^mbrarn 1 donviini, and the ^^cond cne biMwe^n tiarismembrant; 
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domains 8 and 9 Two patterns have been developed to detect this family of proteins The first pattern is based on the 
G-R-fKR] motif but because this motif is too short to be specific to this family of proteins, a pattern ftom a larget region 
centered on the second copy of this motif was denved The second pahern is based on a number of consei\ ed residues 
which are focated at the end of the fourth transmembrane segment and in the short loop region between the fourth 
s and fifth segments. 

Consensus pattern; [ LI VMSTAG3-[L! VM FSAQ]-x{2 }-[LI VM SA]-[DE}-x-[L t VM FYWAj-G- R-[RK|-x(4,6V[GSTA3 
Consensus pattern: [Lr\/MF3.x-G~[L!VMFA3-x(2}-G-x(8Ht!FY}-x(2HeQ3-x(8}- [RK] 

[ 1] Silverman M. Annu. Rev. Biochem. 60:757-794(1991 ).[ 2] Gouid G.W., Bell G.I. Trends Biochem. Sci. 15:18-23 
(1980).[ 3] Baldwin S.A. Biochim. Biophys. Acta 1154:17-49(1993)4 4] Maiden M.C.J,. Davis E.O., Baldwin S.A., Moore 
10 DC M . Henderson PJ F Nature 325 641-643i 1987) [ 5] Henderson PJ F Curt Opm Struct Biol V 590-801(1991) 
[6] Culham D.E.. Lasby 8., Marangoni A.G., Miiner J.L.. Steer B.A., van Nues R.W., Wood J.M. J. Moi. Biol. 229: 
268-276(1993). 

[1515] 633 Synaptobrevin signature 

Synaptobrevin [1] is an intrinsic, membrane protein of small synaptic, vesicles whose function is not yet known but 
*s which is highly conserved in mammals, electric ray (where its is known as VAMP-1 ). Drosophiia and yeast In yeast 
there are two closely related forms of synaptobrevin {genes SNC1 andSNC2) while in mammals there is at least 4 
{genes SVB1 SY82, SYB3 and SYBI..1 ).Struotutally synaptobrevin consist of a N-teimmal cytoplasmic domain offiom 
90 to 110 lesidues. followed by a transmembiane region, and then by a short ifiom 2 to 22 lesidues) C-termmai intra- 
vascular domain. As a signature pattern for synaptobrevin a highly conserved stretch of residues located in the central 
so part of the sequence was selected. 

Consensus pattern: N-[LIVM34DENSHKL3-V-x-[DEQ3-R-x(2HKR]-[!-lVMHSTDE]- x-[LlVM3-x-[DE3-[KR]-[TA]-[D£3 
[ 1] Suedhof T.C, Baumert M.. Perm M.S., Jahn R. Neuron 2:1475-1481(1989!.[ 2] Gerst J.E., Rodgers L, Riggs M.. 
Wigier M. Proc. Nat!. Acad. Sci. U.S.A. 89:4338-4342(1992). 

[1516] 634 TBC. domain Identification of a TBC. domain in G VP6__V E AST and GYP7_ YEAST which are GTPase 
25 activator proteins of yeast Ypt6 and Ypt7 imply that these domains are GTPase activator proteins of Rab-like small 
GTPases. Number of members: 55 

[1] Medline: 96032578. Molecular cloning of a cONA with a novel domain present in the tre-2 oncogene and the 
yeast ceil cycle regulators BUB2 and cdci6 Richaidson PM, Zon L! Oncogene 1995,11 1139-1148 
30 [23Medlme' 9739893? A shared domain between a spindle assembly checkpoint protein and Ypi'Rab-specific 

GTPase-activatots. Neuwald AF. Trends Biochem Sci 1997.22 243-244 

[1517] 635 Ttansenption factor TFIID repeat signature (TBPi 

Ttansenption factor Tf-IIDtorTATA-hinding protein, TBP}[1.2| is a general factor that plavs a major role in the activation 
35 of eukaryotrc. genes transcribed by RNA polymerase I! TFIID binds specifically to the TATA, boi promoter element 
which lies close to the position of transcription initiation. There is a remarkable degree of sequence conservation of a 
C-termmal domain of about 180 residues in TFIID from various eukaryotic sources This region tsnecessary and suf- 
ficient for TATA box binding The most significant structutal feature of this domain is the presence of two conserved 
repeats of a 77 ammo-acid region The intramolecular symmetry generates a saddle-shaped structure that sits astride 
40 the ON A [3] Drosophiia TRF iTBP-telated factor) [4] is a sequence-specific ttansenption factor that also binds to the 
TATA bo* and is highly similar to TFIID Archaebadena also possess a TBP homoloc; [5] A signature pattern that 
spans the last 50 residues of the tepeated legion has been derived. - 

Consensus pattern: Y~'X~P-x(2)-[IF3~'X(2HLIVMK2)~x-[KRH]-x(3)~P-[RKQ]-x(3)~ L-|LiVM3-F~x~[STN3-G~[KR]-[LIVM]-x 
{3K3-[TAGL3-[KR]-x{7)- fAGG3-x{7 S-[L!VM [ 1] Hoffmann A, Sinn E„ Yamamoto T., Wang J., Roy A, Honkoshi M., 
4S Roeder R G Nature 346 38?-390{1 990 1 [ 2] Gash A Hoffmann A , Honkoshi M . Roeder R G C.hua N -hi Nature 
346 390-394(1990)13] Nikolov D.B.. Hu S.-H., Lin J., Gasch A.. Hoffmann A., Horikoshi M., Chua N.-H., Roeder R. 
G Burley S K Nature 360:40-46(1992) [ 4J Crowley T Ev . Hoev T . Liu J -K.. Jan YN , Jan I...Y . Tjian R. Nature 361 
557-561 i 1993) [ 5] Marsh TL Reich C I Whiteiock R B , Olsen G J Proc Nail Acad Sci USA 91 4180-4184 
(1994). 

so [1518] 636 Ttanslationally contt oiled tumoi ptotein signatures (TCTP) 

Mammalian tianslationally controlled tumor piotein < TO TP) ioi P23) is a piotein which has been found to be preferen- 
tially synthesized in cells dunng the early growth phase of some types of tumoi [1.2] but which is also e<pressed in 
norma! cells. The physiological function of TCTP is still not known. It is a hydrophiiic protein of 18 to 20 Kd. Close 
homologs have been found tn plants j3). earthworm ]4j Caenorhabdttts eiegans (F52M2 11). Hydra, budding yeast 

ss i YKL056c) [5] and fission yeast (SpAC1 F12 02c.) Tv\o of the best conserved regions have been selected as signature 
patterns for TCTP. 

Consensus pattern. [!FA3-[GA]-[GAS}-N-[PAK]-S-3GA3-E-[GDE3-[PAGE3-[DE>jGA] 

Consensus pattern- [FLVH3-[Fy3-[!VCT]-G-E-x-[MA}-x.2.5)-[DEN}-[GAST]-x-[LVHAV3-xf 3)-fFVW] 



228 



EP 1 033 405 A2 



[ 1] BoehmH , Beendorf R Gaestel M . Gross B . Nuei nberg P , Kraft R , Otto A Bteika H Biochem Int 19277-286 
0989} f 2] Makrides S , Chitpatima S T . Bandyopadhyay R . Brawerman G Nucleic Acids Res 16 2350-2350; 1988) 
[3] Pay A . Heberie-Bots E . Hirt H Plant Mo! Btol 19 501-503t,1992} [ 4j Stuerzenbaurn S R Kille P, Morgan A J 
Biochfm Biophys Acta 1398 294-304(1 993} [ 5] Rasmussen S VV /east 10 S63-S68f 1994; 

s [1519] 637. TFIIS zmc ribbon domain signature 

Ttansc siphon faotot S-il (THIS) j1 j is a eukaryotic protein neoessaty fot efficient RNA polymerase I! ttanscnption elon- 
gation, past template-encoded pause sties TFliS shews DNA-bmding ae.liuty only in the presence: of RNA polymerase 
II. it is a protein of about 300 amino acids whose sequence is highly conserved in mammals, Drosophila, yeast (where 
it was first known as PPR2, a transcriptional regulator of URA4. and then as DST1. the DNA strand transfer protein 

to alpha [2]1 and in the- archaebactena Suifoiobus acidocaidartus [3] This family also includes the- eukaryolic and arehe- 
bacteria! RNA polymerase subunils of the 15 Kd ; M family (see <PDpC0p790>ias well as the following viral proteins 
- Vaccinia Vitus RNA polymeiase. 30 Kd subumt (rpo30) [4] - African swine fever virus protein 1243L [5] The best 
conserved region ot all these proteins contains four cysteines that bind a ^inc ion and fold in a conformation termed a 
'zinc nbbon' [6] Besides these cysteines there are a number of othei conserved residues which can be used to help 

*5 define a specific pattern for this type of domain. 

Consensus pattern: C-x(2V-C-x(9HLlVMQSARHQHHSTQL]-[RA]-[SACR]-x-[DEHDET]-[PGSEA}-x(6)-C-x{2.5)-C-x 
(3HFVV] [The four C's are zinc ligands] 

[1] Hirashtma S , Hirai H , Nakamshi v , Naton S J Biol Chem 263 3858-33630 988} [ 263 3858-3863 t 1988) [ 2) Ki- 
pling D , Kearsey S E Nature 353: 509--509f 1991} [ 3] l.anger D . Zilltg VV Nucleic Acids Res. 21 2251 -22? It 1993} [ 4] 
20 Ahn B -/ Geishon PD Jones E V. Moss B Mol Call Biol 10-5433-5441(1990} [ 5] Rodngue2 J M , Salas M L 
Vinueia £ Virology 1 86 40-52( 1 992; ( 6] Qian X Jeon C . Yoon H , Agarwal K Weiss M A Nature 365 277-2?9( 1 9935 
[1520] 638 Tetiahydrofolate dehydrogenase/cyclohydrolase signatuies fTHF DHG CYH) 

Enzymes that participate in the transfer of one-carbon units are involved in various biosynthetic pathways. In many of 
these processes the transfers of one-carbon units are mediated by the e.oenzyrne lelrahydrofolate (THR Various 

25 reactions generate one-carbon derivatives of THF which can be mterconverted between different oxidation states by 
formyltetrahydrofoiate synthetasefEC 6 3 4 3), methylenetetrahydrofolate dehydrogenase <CvO 15 15 or EC I 5. U5i 
and methenyitetrahvdrofolate oyclohydrolase iEC 3 5 4 9 '.The dehydrogenase and oyclohydrolase activities are ex- 
pressed by a \anety of multifunctional enzymes - Eukaryotic C.-1-!e!rahydrofo!ate synthase ( C1 -THF synthase) which 
catalyzes all three reactions described above Two forms of C1-THF synthases are known [1j. one is located in the 

30 mitochondria! matrix, while the second one is cytoplasmic. In both forms the dehydrogenase/cyclohydrolase domain 
is located in the N-terminai section of the 900 amino acids protein and consists ot about 300 ammo acid residues The 
C1-THF synthases are NADP- dependent - Eukaryotic mitochondria! Afunctional dehydrogenase/cyclohydrolase (2). 
This is an homodimeiie NAD-dependent enzyme of about 300 amino acid residues - Bacterial folD [3] FolD is an 
homodimeric Afunctional NADP-dependent enzyme ot about 29C! amino acid residues The sequence of the dehydro- 

35 genase.'cyclohydroiase domain is highly conserved in all forms of the; enzyme Two conserved regions have; been 
selected as signature patterns The first one is located in the N-termina! part of these enzymes and contains three 
acidic lestdues Tfn» second ( attem is a highly conceived sequence of 9 amino acids which is located in the C-termmai 
section. 

Consensus pattern: [EQ]-x-[EQKHLIVMK2Kx{2HLiVM]-x(2HL!VMY]-N-x-[DN]-x{5HI-IVMFK3VQ-L-P-{LV3 
40 Consensus pattern: P-G-G-V-G-P-[MF]-T-[IV] 

( 1] Shannon K vV kabinowiiz J C J Biol Chem 263 , 'h-i / 25(1988) [ 2] Belanger C . Mackenzie R E J Biol 
Chem. 264:4837-4843(1 989l[ 3] d'Ari L. Rabinowitz J.C. J. Biol. Chem. 266:23953-23958(1991). 
[1521] 639 Tuosepho^phate isomerase active site (TIM) 

'InoseptHSf hate isomer jse t [-:06 3 1 1} t TIM>[ t|it t!ie glycohtk enzyme that ctfjlyzes the teversibie intercom'ersion 
45 ot glvceraldsih^d- 1 3-phosphat> j and di hydroxy acetone phosphate TIM plays an import ml rule in several mutat-olic 
pathways and is essential for efficient energy production. It is a dimer of identical subunits. each of which is made up 
of about 250 ammo-acid residues. A glutamic acid residue is involved in the catalytic mechanism [2], The sequence 
around the actr e site lesiou^ is perfectly conserveo in all known TIM s and :an be useo as a signature pattern for this 
type of enzyme. 

so Consensus pattern: [AV]-Y-E-P-[LIVM]-W-[SA]-1-G-T-[GK] [E is the active site residue] 

j t| Lolis I: Aibet T Davenport R C Rtse 0 Hartman F C Petsk> G A ISiochemiitty 29 0609 6618) 19'X)} j 2[ 

Knowles j R Natuie 350 121-124(1991 5 

[1522] C4Q Tnymidme kinase cellular-type signature {Tkt 

Ttiymidme kinase {TK} (EC z 7 I 21} is an ubiquitous enz>me that catalyzes thf* ATP-dep^ndent phosphorykilion of 
ss thymidine A eoinpanson of TK ^equ-HKus has shuv\n [1 2 3j that th> j re am Ivio different funics of TK One family 
grouos togetner TK Horn herpes \iruses as well as cellular thymidilate kinases v«nt!e the second family cuirently 
consiitp<.fTKfiointhetol!ov,inj sourvs - Vertebiat^t. -Bacteual - Barteiiopha^e T4 - Po> viruses - African swine 
fever virus (A^Fi - Fish lymphooystis dist-as^' mius i^FLDViA conserv> : 'd legion wtiich is located in the C-tenninal 
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section of the^e enzymes nas been selected as a signature pattern for this family of TkA 

Consensus pattern: EGA3-x{1.2)-[DE]-x-Y-x-[STAP]-x-C-[NKR}-x-[CH]-[LiVMFYWH] [ 1] Boyle D.B.. Coupar 8.E.H.. 
Gihbs A J Seaman L J Both G v\' vitok jy 1 56 o55-165t 1987) [ 2] Bi^cu R Lope.: Otm C Murioz M Be :kamt. 
E-O Simon-MateoC Vint tela E Vitoloay P3 301-^04i 1090) [ 3] Robertson G R Whulley t M Nucleic Ac ns Fes 
s 18:11303-11317(1988). 

[1523] t-41 !hvmmn<r kin.ts<r from herpes^'iius t TK hetpeM 
[1] 

Medline: 96003730 

Crystal structure? of the thvnudme kinase ftom hetpt-s i.tmpie\ ■j»<^ type i tn <.ompie< with deun thymidine and gan 
to ciciovir. 

BiwnDG \-issr R £ mdr.n G Davie^ A PizS-allah PJ MuhL: 
C. Summers WC. Sanderson MR: 
Nat Struct Biol 1995:2:876-881 
Number of members; 65 

|1 5243 642. Nuclear transition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of nucteosoma! chromatin to 
the compact, non-nucleosomal and transcriptionally inactive form found m the sperm nucleus This condensation is 
associatea with a double-protein transition The first transition cot responds to the replacement of nistones oy several 
spermatid-specific proteins, also called transition proteins, which are themselves replaced bv protamines during the 

J- s-icond iran^ition riudear ttaribitton piofem z (TP2i is cne of those spermatio'-sp^eific prottiint. TP2 tb a basic z\tt<.- 
binding ciotein [1] ff 110 to H7 atTHno-ucid residues Sttu:turaliy TP2 insists of tnte*- distinct parts <j conset^ed 
serme-nch N-tetmrna! domain of aoout 25 lesioues a variable central domain of 20 to 50 lesiaues u\nich contains 
cysteine icbiduts and? _xnb<;r.'ed c-tetminal domain of about "Uiestdu*^ tichm Klines and ^iginmes Two big nature 
patterns for TP2 havi* hufn de^elopto un« located in tht N-tt!imtnal dom iin the utf^t in thu C -terminal 

25 Consensus pattern: H~x(3V-H-S-[NS)-S-x-P-Q-S 

Consensus pattern K v ft K-^2)-E G-K v{ : t Y -[KRj-K 

[1] Baskaran R Rao M R b Biothnn Bioph>s Res Coimnun ! 7 9 14PM4P9{ 
[1525] 643 Thiamin- 1 pwuphusphatu enzymes siqn iture (TTP enzymes t 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor It has been shown [1] that 
J0 som.ioiihw'en^vnit-s ate structurally i€.-ht.='d fheset^laf-id !PP*n.:ym^ar* -Pyruvate oxidase {POM{fcf 1 1 o o) 
Reaction :ata!vzed pytuvate + orthfphfsr.hate + C\2> + Hi2'0 - acetyl pnosph it*- + CO(2; + H(2>~>t2) P/niwit^ 
aecarbo^lase (PDCuEC A 1 1 1 s Reaction catalyzed pyruvate = acetaldehyde + COt2i - Indole pyruvate dec^rfcox- 
y!abc^EC4 ! ! 7-1 1 [z] Reaction cataK^ed rndoli*-3-j.\iu^at» : > = indol<;-3-^cet-ilnehid<; + CL\2i -Acftoladdtcwiitha^ 
lAl. S) tt:C 4 1 3 18 1 Ruction >.ataly7ed 2 pytuvate - ..uvfnlac t.jt*- ► CO{-) K(rnm\ Herniate de; tthoo/lase tfif Oi 
i EC 4 I I 7) [3] Reaction catalysed oenzoylformate = benraloehyde + COO A consep ed region svhich i e located in 
their C ••terminal section has been selected as a signature pattern for these enzymes. 
Con^n<.u<. ( attorn [LlvMFj-[GRA]->t.5}-P-nf4)-[LIVMFYVV]-v-[LI\/MF]->-G-D-[GSAHG^AC] 

[ !) Gr^en J B A TUBS L^tt 246 1-S t98<S) [ 2] Koga J $w hi T Hidaf-a H Mol Gf-n G^nt-t 226 10-16(1^1 ) [ 3] 
TsouA-i Ransom SC GeritJA Euecnter D D Babbitt PC Kenyon G I Biocnemistiy 1-J O eo6~y86:{ tOuOi 
40 [1526] C44 TPP Ckmain 
[1] 

Medline: 95397415 

'letiattico peptide repeat mtetactions to TPP oi not to TPR" 
Lamb JR Tu^etidreich 6«, Hietet P 
4S Tff-nds Bioch^m 3oi 1Q&5 20 25~-25& 

[:]Medlme 98I5134J 

The structure of the tetratncopeptide repeats of protein phosphatase 5: implications for TPR -mediated protein-protein 
interactions. 

Das AK Cohen PW Barford D 
so EMBO J 1998:17:1192-1199. 

Number of members: 621 

[1527] 64^ Utoporphvnn-lil C-methylttansferase signatures (TP methyiase) 

Uroporphyrin-!! I C-methy [transferase (EC 2 1 1 107 1 ^SUMT; j t catalyses the transfer of two methyl groups from S- 
adenosyl-L-methiontne to the C-2 and C-7atoms of uroporphyrinogen 111 to yield precornn-2 vsa the intermediate for- 
ss matron of precorrin-1. 

SUMT is the tiibt en^me bpecific to the cohalamin pathway and precorrin-2 is a common intermediate in the biosyn- 
thet-ii of cotnnoids such vitamin P12, ^iroheme ^nd co^nryme F<130.The sequence? of SUfy)T from ^ variety of 
^utacU'tial and atchatibact^tial spi^cies ate curteritly available' In species such as Bacillus megatenum (gene cobA). 
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Pseudomonas denitrificans (cobA) or Methanobacterium ivanovii (gene corA) SUMT is a protein of about 25 to 30 Kd. 
In Escherichia coli and related bacteria, the cvsG protein, which is involved in the biosynthesis of siroheme. is a mul- 
tifunctional protein composed of a N-tsrminal domain, probably involved in transforming precorrin-2 into siroheme, and 
a C-termmal domain which has SUMT activity The sequence of SUMT is related to that of a number of P. denitrificans 
and Salmonella typhimunum enzymes involved in the biosynthesis of cobaiamin which also seem to be SAM -dependent 
methyltransfe rases [3,4] The similarity is especially strong with two of these enzymes: cobi/cbiL which encodes S- 
adenosyl-L-rreHhionine--precomn-2 methyitransferase and cobM/ebiF whose exact function is not known. Two signa- 
ture patterns have been developed for these enzymes The first corresponds to a weli conserved region in the N- 
termmai extremity (caiied region 1 in [1,3]) and the second to a less conserved region located in the central part of 
these proteins (this pattern spans what are called regions 2 and 3 in [1.3]}. 
Consensus pattern: [LIVMHGSHSTAL]-G-P-G-x{3HLiVMFYHL!VM]~T-[LlVMHKRHQGHAG] 
Consensus pattern: V-x(2)-[LI3-x(2)-G-D-x(3)-[FYV^-[GS]-x{8)-[LIVF3-x{5.6HLIVMFYVVPAC]-x-[L!VMY]-x-P-G 
[ 1] Blanche F. Robin C, Couder M., Faucher D . Cauchois L . Cameron B., Crouzet J J Bacteriol. 173 4637 -4645 
(1991 }.[ 2j Pobm C . Blanche F.. Cauchois L : Cameron B., Couder M„ Crouzet J. J. Bacteriol. 173 4893-4896(1991). 
[ 3] Crouzet J., Cameron B.. Cauchois L, Rigault S , Rouyez M-C, Blanche R, Thibaut D.. Debussche L. J. Bacteriol. 
172:5980-5990(1 990). [ 4] Roth J.R., Lawrence J.G., Rubenfieid M., Kieffer-Higgms S„ Church G.M J. Bacteriol. 175: 
3303-3316(1993) [ 5] Mattheakis I. C , Shen W.H : Collier R.J Mol. Cell. Biol. 1 2:4026-403 )'(1992). 
[1528] 646 Tudor domain 

Domain of unknown function present in several RNA-bmding proteins copies in the Drosophila Tudor protein. Slight 
ambiguities in the alignment. Number of members: 18 

[1]Medime 97200561 Tudor domains in proteins that interact with RNA Pontmg CP; Trends Biochem Sci 1997,22. 
51-52. [2]Medhne: 97157029 The human EBNA-2 coactivator p100: multidomain organization and relationship to the 
staphylococcal nuclease foid and to the tudor protein involved in Drosophila meianogaster development Callebaut I. 
Momon JP; Biochem J 1997;321:125-132. 
[1529j 647. Teipene synthase family 

It has been suggested that this gene family be designated tps (for terpene synthase) [1], It has been split into six 
stihgic ups on thsr- basis ot ph/logsr-n\ caiied t^a-tps-f tpsa includes vetispiudiene synthase Swiss:Q39979, 5-epi- 
insioloohene synth^st Swis*, Q40"~ md (+Vd> j lt3-^ idmene synthase Swiss P93665. 
tpsb includes ^-j-limonene synthase Svuss Q40322 
ipse includes kaurene synthase A. Swiss:004408. 
tfsd induces ta>aoiene svnthus^ S i viss 04 1594 pinen^ synthase 
Svviss 0244~5 ana myrcene synthase Swiss 024474 
tpi.e includes katnene synth^e B 
tpsf includes linalool synthase 
Number of members: 51 

m 

Medline: 97413772 

MonoWp^nt- svnth^ses fiom gtano fn (Abies gr3ndis) cDNA isolation maraeterization, and functional expression of 
myicene synthase t-H4S}4imonene synthase and i-wjS 5S)~pinene synthase, 
Bohlmann I Steeie CL - tote^u P 
J Biol Chem 1997:272:21784-21 792. 
[1530] 648 ThiF family 

This family contains a repeated domain in ubiqurtin activating enzyme E1 and members of the bacterial 
I hiRMoeWHei & family Numbei of membeii Si 
[1531] 64« Thtoesfer dfhvdrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 19 
[1] 

Medline: 96398612 

btructuie ofa deh^dntase-ibomeiase fioin the Lacteiial pathv.a/ for biosynthesis of unsaturated fatty acids: two cat- 
alytic activities m one active site. 
Le^s^ng M Hendeison BS Gillig JR Schwab JM "Smith fL 
Structure 1996:4:253-284, 
Database Refeience SCOP 1mka fa [SCOP USA] [CATH-PL>BSUM] 
Database reference: PFAMB: PB058036: 
[1532] 65U Tud family signatuies 

The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. This mutation 
maps to a gene, tub [1.2].which codes for a protein that belongs to a family which currently consists of the following 
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members - Mammalian tub. an hydrophilic protein of about 500 residues which couid be involved tn the hypothalamic 
regulation of body weight, - Human protein TULP1 [3] which may be involved in retmis pigmentosa 14. a retinal de- 
generation disease - Mouses proton p4-6 whose function is. not known - Caenorhabdrlis elegans hypothetical protein 
F1GB5 4 - Seveial fragmentary sequences from plants. Drosophila and human ESTs While the N-teimina! part of 
* these protein is not conserved in length nor in the sequence, the C -terminal 250 residues are highly conserved There- 
fore two regions were selected in the C -terminal part as signature patterns:. The secondr egion is located at the C- 
terminal extremity and contains a penultimate cysteine residue that could be cntical to the normal functioning of these 
proteins. 

Consensus pattern: f : -jKHQi-G-R-V-[ST3--x--A-S-V--K-N-f : -Q 

10 Consensus pattern- A-F-[AG]-l-[SAC3-[UV'Mj-[ST]-S-F->-[GST]-K-x-A-C-E 

[ 1]K!eyn P.W., Fan W, Kovafs S.G., LeeJ.L. Pulido J.C., Wu Y, Berkemeier L.R., Misurni D.J., Holmgren I., Charlat 
O.. Woolf E.A.. Tayber O., Brody T. . Shu P., Hawkins F., Kennedy B., Baldim L.. Ebelmg C, Alpenn G.D., Deeds J., 
Lakey N 0., Culpeppei J . Chen H„ Gluecksmann-Kuis M A . Carlson G.A , Duyk G M., Moore K.J. Cell 85 281 -290 
{1996 ).[ 2]Noben-Trauth K., Naggert J.K., North MA, Nishina P.M. Nature 380:534~538{19S6).[ 3] North MA. Naggert 

*5 J.K., Yan Y. Noben-Trauth K., Nishina P.M. Proe, Natl. Acad, Sci. U.S.A. 94:3128-3133(1997). 
[1533] 65! Eukaryotic DNAtopoisomerase I active site 

DNA topoisomerase I (CO 5 99 1 2) [1.2.3. 4,E1j is one of the two types of enzyme that catalyze the interconveision 
of topological DNA isomers Type itopoisomerases act by catalyzing the transient breakage of DNA. one strand at a 
time, and the subsequent rejoining of the strands When a eukaryotic type iropotsomerase breaks a DiMft, backbone 

so bond, it simultaneously forms a protein-DNA link where the hydroxy! group of a tyrosine residue is joined to a 3'- 
phosphate on DNA. at one end of the enzyme-seveted DNA strand In euhaiyotes and pox virus topoisomerases I, 
there are a number of conserved residues in the region around the active site tyrosine 
Consensus pattern: [DEN3-x{6)-[GSHIT}-S-K-x(2VY-[LiVM]-x{3Hi-!VM] [Y is the active site tyrosine] 
[ 1) Stetnglanz R. Curr. Opm. Cell Biol. 1:533-535{1990V[2] Sharma A., Mondragon A. Curt Opin. Struct. Biol. 5:39-47 

25 (1995i[ 3] Lynn R.M , Bjornsti M.-A., Caron P.R., Wang J.C. Proc. Natl. Acad. Sci. U.S.A. 86:3559-3563(1989) ( 4] 
Rocs J Trends Biochem. Sci. 2(M 66- 1(30(1 995). j'O-l'j 
[1534] 652 Transaldolase signatures 

Transaldolase (EC 2 2 12} catalyzes the reversible transfer of a three-oarbonketol unit from sedohepiuiose 7-phos- 
phate to glyceraldehyde 3-phosphate to form erythrose 4-phosphate and fructose 6-phosphate This enzyme together 

30 with transketolase, provides a hni< between the glycolytic and pentose-phosphate pathways Transaldolase is an en- 
zyme of about 34 Kd whose sequence has been well conserved thtoughout evolution A lysine has been implicated 
[1]in the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the carbony! group of fructose- 
6-phosphate.Transaldolase is evolutionary related [2] to a bacterial protein of about 20Kd (known astalC in Escherichia 
coli), whose e^act function is not yet known. Two signature patterns hase been developed for these proteins The fust, 

35 located m the N-terrninai section, contains a perfectly conserved pentapeptide. these cond includes the active site 
lysine. 

Consensus pattern: [DGHIVSA]-T-[ST]-N-P-[STAHL1VMF](2) 
Consensus pattern: [LIVMJ-x~[LSVM]-K-JLSVMHPAS3^ 
[K is the active site residue] 

40 [ 1] Miosga I. Schaaff-Gerstenschlaeger !.. Franken E.. Zimmermann F.K. Yeast 9:1241~1249{1993).[ 2] Reizer J., 
Reizei A . Saier M H Ji Microbiology 141 961 -97 1t, 1995} 
[1535] 653. (Transpeptidase) Penicillin binding protein transpeptidase domain 
[1536] The active site serine (residue 337 in Swiss P 14677) is conserved in all members of this family. 
[1537] j t] Pares S Mouz hi, Petillot Hakenbeck R. Didebeig O Nat Stiuct Biol 1 996.3 284- 289 

45 [1538] 654. Trehalase signatures 

Trehalase (EC 3 2 1 28) is the enzyme responsible for the degradation of the disacchande alpha, alpha-trehalose 
yielding two glucose subunits [1 3. it is an enzyme found in a wide variety of organisms and whose sequence has been 
highly conset^ed thioughout evolution Two of the most highly conserved regions have been selected as signatuie 
patterns The first pattern is located in the central section the second one is in the C-tenmnal region Consensus 

so pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 

Consensus pattern. Q-W-D-<-P- *-[GAj-W-[PAS]--P 

[ IJKoppM.. Mueller H.. HolzerH. J. Biol. Chern. 288:4786-4774(1 993). [23 Henrissat B . Bairoch A. Biochem. J. 293: 
781-788(1 993).[E1 ] 

[1539] 655. Ttehalose-6-phosphate synthase domain 
55 [1540] OisA (Treha!ose-6-phosphate synthase? is homologous to regions in the subunits of yeast tiehaiose-6-phos- 
phate synthase/phosphate complex, [1], 

[1541] ft] Kaasen I McDougall J, Stiom AR. Gene 199-1 145 9-15 
[1542] 656. Tropomyosins signature 
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Tropomyosins [1^] are family of closely related proteins present in muscle and non-muscle cells, in striated muscle, 
tiopomyosin mtdiate the interactions b^twe^n the troponin complex and =i.:tin s<. as to > emulate mub_ie contraction 
The tole of tropomyosin in smooth musde arid non-musolt; tissues is not cleat Tiopomyosin & =m alfJu-htiliral (.tolein 
that foims a coiled-coil dimer Muscle isofotms of tropomyosin are chaiaitenzed oy having 284 amino acio lesidues 
s an<1 a highlj conserved N-terminal region wincroas non muscle forms arc generally smaller and are heterogeneous 
in then N-termmal iegk n 7 he signatuie pattern foi ttopom\osms is based on a veiy tonser-ed teyion in theC-termina! 
section of tropomyosins and <vhirh is prtstnt in both muscle and non-muscle fornix 
Consensus pattern: L-K-E-A-E-x-R-A-E 

j 1 1 Smilif I. & 7 rends Biochem Sci 4 Iftl-tfeoflft;^) { 2} rVkt.eod a P Btot.ssavs 6 <!08 0 1 2i 1 98(r t 
10 [1543] 657. Troponin 

Troponin iTn) contains three subunrts. Ca2+ binding (TnO, inhibitory iTnh. and tropomyosin binding (TnT). this Pfam 
contains members of the TnT subuntt. 

Troponin is a complex of three proteins. Ca2+ binding (TnC), inhibitory (Tnl), and tropomyosin binding (TnT). 
The troponin complex regulates Ca++ induced muscle contraction. 
»5 This family includes troponin T and troponin L Troponin I binds to actin and troponin T binds to tropomyosin. 
Number of members: 81 [1] 
Medline: 87144593 

Stiucture of co-crybtals of tropomyosin and troponin 
White SP, Cohen C, Phillips GN Jr; 
20 Nature 1987:325:826-828. [2] 
Medline: 95155315 

A direct regulatory role for troponin T and a dual role for 

troponin C in the Ca2+ regulation of muscle contraction 

Potter JD : Sheng Z. Pan BS, Zhao J: 
25 J Biol Chem 1 995:270:2557-2582. 

[3]Med!ine' 95324796 

The troponin complex and regulation of muscle contraction. 

Fa rah CS : Reinach FC, 

FASEB J 1995:9:755-767. 
30 [1544] 658. (Ttyp mucin) Muem-like glycoprotein 

[1545] This family of trypanosomal proteins resemble vertebrate mucins The protein consists of three regions The 

N and C terminii are conserved between all members of the family, whereas the centra! region is not well conserved 

and contains 3 large number of threonine residues which can be glycosylated [1] 

Indirect evidence suggested that these genes might encode the core protein of parasite mucins, glycoproteins that 
35 were proposed to be involved in the interaction with, and invasion of, mammalian host cells 

[1] Di Noia JM. Sanchez DO, Frasch AC, J Biol Chem 1995,270 24146-24149. 

[2j Di Noia JM. D'Orso !. Asiund L Sanchez DO, Frasch AC, J Biol Chem 1998,273 10843-10850. 

40 [1546] 659 Aminoacyi-transfer RNA synthetases class-! signature ftRNA synt 1 1 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each differentamino acid In eukaryotes there are generally 
two aminoacyi-tRNA synthetases for each different amino acid: one cytosolic. form and a mitochondrial form. While a!! 

■*s these enzymes have a common function, they are widely diverse informs of su burnt size and of quaternary structure 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terrninai 
section, in particular the consensus tetrapeptide His-lle-Gly-His ("HIGH) is very well conserved. The 'HIGH' region has 
been shown [3] to be part of the adenylate binding site The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific for arginine, cysteine, glutamic acid, glutamme, isoleucine, leucine, methionine, tyrosine, tryp- 

so tophan. and valine. These aminoacyl-tRNA synthetases are referred to as class-i synthetases [4,5,6] and seem to 
share the same tertiarystructure based on a Rossmann fold. Consensus pattern - P--:<(£).2S-(GSTAN3--[Dt-:NOGAPK3-:<" 
[LIVMFPHHTHLIVMYACI-G-fHNTGHUVMFYSTAGPC3 

[ 1) Schimme! P. Annu. Rev. Biochem. 56: 125-158(1 987) [ 2) Webster T, Tsai H„ Kula M., Maekie G.A., Schimme! P. 
Science 228:131S-1317(1984).[ 3] Brick P.. BhatT.N . Blow DM. J. Mo! Biol. 208:83-98(1988},[ 4] Delarue M„ Moras 
ss D. BioEssays 15:675-687(1993),[ 5] Schimme! P. Trends Biochem. Sci. 16: 1-3(1991 ).[ 6] Nagel G.M., Doolittle R.F. 
Proc. Natl. Acad Sci U.S A 38' 81 2 1-81 25(1 991). 

[1547] 660. Aminoacyi-transfer RNA synthetases class-! signature (tRNA synt lb) 

Ammoacyl-tRNA synthetases (EC 6 1 1 -) [1] are a group of enzymes which activate ammo acids and transfer them 
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to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at [east twenty 
different types of aminoacyl-tRNA synthetases, one for each different ammo acid, in eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid' one cytosoiie form and a mitochondrial form While aii 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure. 

s A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particuiarthe consensus tetrapeptide His-iie-Giy-His {'HIGH') is very well conserved. The 'HIGH'region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific forarginine, cysteine, giutamic acid, giutamme. isoleucine, leucine, methionine, tyrosine, tryp- 
tophan, and valine These aminoacyi-tRNA synthetases are referred to as class-! synthetases ]4.5.6j and seem to 

10 shaft the same tertiary structure based on a Rossmann fold Consensus pattern' P-:<!0,2)-[GSTANj-[DEr>!QGAPK]-:<- 
(LIVMFPHHTHLIVMYAC]-G-[HNTGHHVMFYSTAGPC 

[ 1] Schimme! P. Annu. Rev. Biochem. 56:125-158(1 987 ).[ 2] Webster T., Tsai H„ Kuia M., Mackie G.A., Schimme! P. 
Science 228: 131 5-1 31 7(1 984 ■).[ 3] Brick P., Bhat T.N . Blow DM. J. Mo! Biol. 208:83-98(1 988}.[ 4] Delarue M„ Moras 
D. BioEssays 15:675-887(1993}.[ 5] Schimme! P. Trends Biochem. Sci. 18:1-3(1991 ).[ 6] Naget G.M., Doolittle R.F. 
*5 Proc. NatS, Acad, Sci. U.S.A. 88:8121-8125(1991), 

[1 548] 661 . (tRNA-synt 1 C } tRNA synthetases class 1 { E and Q) 

[1549] Other tRNA synthetase sub-families are too dissimilar to be included 

This family includes only glutamyl and giutaminy! tRNA synthetases. 

In some organisms, a single glutamyl -tRNA synthetase aminoarylates both tRNAfGiu) and tRNA; Gin) 
20 [1550] [1] Rath VL, Silvian IF, Beyer B. Sproat BS. Steitz TA, Structure 1998:6 439-449. 
[1551] 662 {tRNA-synt 1d) tRNA synthetases class i fR} 
[1552] Other tRNA synthetase sub-families are too dissimilar to be included 
This family includes only arginyi tRNA synthetase. 

[1553] 663. Aminoacyl-transfer RNA synthetases class-!! signatures {tRNA synt 2) 

25 Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis in prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid one cytosoiie form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure, 

30 The synthetases specific for alanine, asparagme. aspartic acid, giycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-! i synthetases |2 to 6] and probably have a common folding pattern in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 
I synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present (2,5,8]. Signature patterns have been derived from two of these regions 

35 Consensus pattern: [FYH]-R-x-[DE]-x(4 > 12HRH3-x{3)-F-x{3HDE 

Consensus pattern: [GSTALVFj-{DENQHRKP}~[GSTA]"[LIVMF]-[OE)~R~[LIVMF}-x-[LIVMSTAGHLiVfV!FY] 
[ 1] Schirnmel P Annu Rev. Biochem 56 125-158{1987}.[ 2] Delarue M , Moras D BioEssays 15 675-687(1993) [ 3] 
Schimme! P Trends Biochem. Sci. 16 1-3(1991) [4] Nagel G fVi , Doolittle R.F. Proc. Natl. Acad. Sci. U.S. A 88 
8121-8125(1991). ( 5) Cusack S , Haertlein M , Leberman R, Nucleic Acids Res. 19.3489-3498(1991) [ G) Cusack S 

40 Biochimie 75:1077-1081(1993).] 7] Cusack S.. Berthet-Colommas C, Haertiein M., Nassar N., Leberman R. Nature 
347 249-255(1990).] 8] Leveque F , Plateau P., Dessen P., Bianquet S Nucleic Acids Res. 18:305-312(1990). 
[1554] 664 Aminoacyl-transfer RNA synthetases class-! signature (tRNA synt 1e) 

Aminoacyl-tRNA synthetases (EC 6,1,1,-} [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 

45 different types of aminoacyl-tRNA synthetases, one for each different amino acid, in eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosoiie form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure, 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particuiar the consensus tetrapeptide His-ile-Gly-His {'HIGH') is very well conserved. The 'HIGH' region has 

so been shown ]3]to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific forarginine, cysteine, glutamic acid, glutamirte. isoleucine, leucine, methionine, tyrosine, tryp- 
tophan, and valine These aminoacyl-tRNA synthetases are referred to as class-! synthetases [4.5.6] and seem to 
share the same tertiary structure based on a Rossmann fold. 

Consensus pattern: P-x(O l 2HGSTANHDENQGAPK3-x-]LiVMFPHHTHLiVMYAC]-G--]HNTG]-[LIVMFYSTAGPC 
ss (1] Schimme! P. Annu. Rev. Biochem. 56:125-158(1 987).[ 2] Webster T, Tsai H., Kula M., Mackie G.A., Schimme! P. 
Science 226:1315-1317(1984),] 3] Brick P., Bhat T.N. , Biow DM. J. Mo!. Biol. 208:83-98(1 988). [4] Delarue M., Moras 
D BioEssays 15 675-687(1993).] 5] Schimme! P Trends Biochem Sci l6:1-3( 1991} f 6] Nagel G.M.. Doolittle R F 
Proc. Natl. Acad Sci USA 88 8121-8125(1991). 
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[1555] 66C Aminoacyt-transfer RNA synthetases ciass-l! signatures ^tRtsiA synt 2di 

Aminoacvi-tRNA synthetases (Eu 6.1.1.-) [1] are a group of enzymes which activate ammo acids and transfer them 
k specific tRNA molecules as i.h-3 fusl stef in [flit™ biosynthesis, In prokatvofie c igani&nis thtie are at least twenty 
oiffwent tyr.es of amtn^acyi-tPMA syntnetoses ?ne foi each different amino acid in eukaryotes there ..tie gwieialh 

s two aminoacj ! tRNA synthetases for each different amino acid one cv'tosohc form and a mitochondrial form While a!! 
these enzymes h.ue .t loinmon function thev -tie 'Aidelv diverse inteims of subunit i.i^e .jnd of quaternary strnctute 
The Ssnth> j t3s> j s specific lor alanine aspat igme aspatficactd glycine hisiidin- 1 lysine- phenylalanine proline serine 
and tnreonine are referred to as ciass-l I syntnetases [2 to 6] ano probably have a common folding pattern in their 
catalytic domain foi the binding of ATP and ammo a^id whkh is diffeientto the Rossmann fold observed foi the class 

to ] s>nthetases [7] Cia^s-li tRNA synthetases do not share a high degree of simiLinty he we i at ieast thiw conserved 
legions are pte-se-nt [2 S b] Signature patterns, hav.* been deiived Hons two of these regions 
Consensus pattern: [FYHj-R-x-[DE3-x(4.12HRH]-x{3)-F-x{3HDE 

Consensus pattern [G&rALVF-l-PE-.N'jHRhP}-[C : 5S7AJ-ELIVMF>[i:H-: j-R-fj.h, MF ]■ <-[UVM&rAG]-[l.K MF V] 

[1]Schimme!P Annn Rev eicxhwn 56 1 58< 1987) [ 2] Delaiue M Moias D BioEssavs 15 ^ 5-6*7 < i993'[ 3] 
*5 Schimme! P. Trends Bioebem. Sol. 16:1-3(1991). [ 4) Nage! G.M.. Doolffiie R.F. Proc. Natl. Acad. Sci. U.S.A. 88: 
8121-8125(1991). [ 5] Cusack S.. Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991 !.[ 6] Cusack S. 
^lochimie y 6 10" ? 7 1«.i81(199W "i <■- m aci> S Berthet-Colommas C Haertk-in M Naas.ji N Lebeiman R Nature 
347 249-155(1^0* [ 3] Le\eque F Plateau P Dessen P blanquet S Nucleic Acids Res 18 30C-312i iOuOi 
[1556] 666 Thau matin family signature 
so Thaurnatin [1] is an intensively sweet-tastinq protein (100 00U times sweeter than sucrose on a molar basis) from 
Tnaumatococ:us dunieiiii an African brush The ptotein is made of about 200 residues and contains Sdisultiae fc •jnds 
A nnmbet of proteins ha^e been found to be related to thaumatms These protein are listed below (references are onl\ 
ptovided fori *»<_*>nt)v d^tei mined sequences) - A mice ilphi-^mjteste/tfypstn inhibitor -Tw*. tobacco pattugen^is- 
ie!ate-d ptote-ins PP-R major and minui fornix which ire mduee-d aftei infe-erton vsith vituses - Salt-induced ptotein 
25 NP24 from tomato. - Osmotm. a salt-induced protein from tobacco. - Osmotin-like proteins OSML13, OSML15 and 
OSML 81 from potato {.':] - P.? I a leaf pi otem from soybean ■ PW1R2 a leaf protein fiom wheat ■ Zeamatin a mat?* 
antifunal pioteinp] The exact biologic al function of ali these \. totems is not yet knewn Aconse-r ed tegion thai includes 
threu c ysteine residues knit* n {in thaurnatin t to be involved in disulfide bonds h is beun se-lecte-d is a sicwatur- 1 pattern 

30 v*> >0X<Vm>0X0X<C0C<vC t C0X<Vm>0X<Vnf ><Ck< ><>C<CmC> v C O- <Gx O m > v «C <jjjjjjjjjjjj +-++-+[+- -+ 

+„++-+ ] + + C' conseived cysteine invoked in a disulfide bond x ' position of the pattern 

Consensus pattern: G-x-[GF]-x-C-x-T-[GA3-D-C-xf1.2)-G-x(2,3)-C 

[ 1] Edens L. Hesimga L. Klok R.. Ledeboer A.M., Wiaat J., Toonen hi i VisserC, Vernps C.T. Gene 18:1-12(1982). 
[2] Zhu B.. Chen T.H.H . Li PH. Plant Physiol. 108:929-937{1995).[ 3] Mafehorn D.E.. Borgmeyer J.R., Smith C.E.. 
35 Shah DM Plant Pfnsio! 106 1471-14^1. 19«4t 
[1557] 66/'. Thioiaaea s if- natures 

Two difkient types of thioias^ [IZj] aro kund both in eukfjiyotos : md m ptokaryotes ac^toacetyi-CoA thiolase (EC 
2 3 19' and S-k^to^cvl-CoA tniolase(EC 2 3 I !6i 3-Ketoacyl-CoA thioiasf- (also called thmlas^ It h^s a bio^d chatn- 
iength specificity tor its substrates and is m\?l\ed in degradati\/e pathways, such as fatty acid oeta-osdation Ace- 

40 toacetyi-CoAthiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetie 
pathways such as poly beta-hydro:<ybutyrate synthesisoi steroid biogenesis. In eukaryotes. there are two forms of 
3-ketoacyl-CoA thioiase: one located in the mitochondrion and the other in peroxisomes. There are two conserved 
cysteine residues important for thiolase activity. The first located in the N -terminal section of the enzymes is involved 
in the formation of an acyi-enzyme intermediate; the second located at the C-terminal extremity is the active site base 

■*s involved in deprotonation in the: condensation reaction Mammalian nonspecific lipid-trsmsffcr protein (nsL-TP) (also 
known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and 
a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm orthe mitochondria and is involved in lipid transport; 
the latter is found in peroxisomes. 

The C-terminal part of SCP-x is identical to SCP-2 while the N-termina! portion is evolutionary related to thiolases[4], 
50 Three signature patterns have been developed for this family of proteins, two of which are based on the regions around 
the biologically important cysteines. The third is based on a highly conseived region tn the C-terminal part of these 
proteins. 

Consensus pattern: [LIVIV!HNST3-x(2)-C-{SAGL!HSTHSAGHL!VMFYNS3-x-[STAGHtiVM)-x{6HL!VM] [C is involved 
in formation of acyl-enzyme intermediate] 
ss Consensus pattern: N-x{2)-G-G-x-[LIVM}-j:SA]-x-G-H-P-x-[GA3-x-|ST3-G 

Consensus pattern' [AG3-[LiVMAHSTAGCLlVM3-[STAGHi-IVMAj-C-x-[AG]-x-[AG]->f-[AG3-x-[SAG3[C is the active site 
residue] 

[ 1] Peoples O.R, Sinskey A.J. J. Biol. Chem. 264:15293-1 5297(1989). [ 2] Yang S.-Y., Yang X.-Y.H., Healy-Louie G., 
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Schuiz H., Elzinga M J Biol. Chem. 265' 10424-10429(1990) [ 3] igual J.C . Gonzalez-Bosch C , Dopazo J.. Perez- 
Ortin J.E. J. Mo!. Evol. 35:147-155(1 992). [ 4] Baker M.E., Billheimer J.T., Strauss J.F. Hi DNA Cell Bio!. 10:695-698 
(1991). 

[1558] 668. Thioredoxin family active site 

s Thioredoxins [1 to 4] are- small proteins of approximately one hundred amino-acid residues which participate in various 
redox reactions via the reversible oxidation of an active center disulfide bond They exist in either a reduced form or 
an oxidized form where the two cysteine residues are linked in an intramolecular disulfide bond. Thiorecioxin is present 
in prokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is weiiconserved. Bacteri- 
ophage T4 also encodes for a thioredoxin but its primary structure is not homologous to bacterial, plant and vertebrate 

to thioredoxins. A number of eukaryotic proteins contain domains evolutionary related tothioredoxm, all of them seem to 
be protein disulphide isomerases (PDI). PDlfEC 5. 3.4. 1 ) [5,6,7] is an endoplasmic reticulum enzyme that catalyzes 
the rearrangement of disulfide bonds in various proteins. The various forms of PDI which are currently known are: - 
PDI major isozyme, a multifunctional protein that also function as the beta subunit of prolyl 4--hydro:<ylase (EC 
1.14 11.2). as a component of oligosaccharyl transferase (£0 2 4 1 119). as thyroxine deiodinase (EC 3.3. 1.4}. as 

»5 glutathione-insulin transhydrogenase (EC 18 4.2) and as a thyroid hormone-binding protein I - £Rp8G (ER-60; 58 Kd 
microsomal protein). ERp80 was originally thought to be a phosphotnositide-specific phospholipase C isozyme and 
later to be a protease. ■ CvRpV'2 ■■ PS. All PDI contains two or three (ERp/2} copies of the thioredoxin domain. Bacterial 
proteins that act as thtordisulfide interchange proteins thataltows disulfide bond formation in some penplasmic proteins 
also contain a thtoredoxin domain. These proteins are. ■■ Escherichia coli dsbA (or prfA) and its orrhologs in Vibrio 

so cholerae (tcpG) and Haemophilus influenzae (port - Escherichia colt dsbC (or <pRA) and its orthologs in Erwinia 
chrysanthemi and Haemophilus influenzae - Escherichia coli dsbD (or dipZ) and its Haemophilus influenzae ortholog. 
- Escherichia coli dsbE for ccmG) and orthoiogs in Haemophilus influenzae, Rhodobacter capsulatus (heiX), Rhizio- 
biacae (cycY and tlpA). Consensus pattern: [LlVMF]-[LfVMSTA]-x-[LIVMFYCHFYWSTHE]-x(2)- [FYWGTN]-C- [GAT- 
PLVEHPHYWSTA]-C-X(6)-[LiVMFYWTl [The two C*s form the redox-active bond] 

25 [ 1] Holmgren A. Annu. Rev. Biochem. 54:237-271 f1985}.[ 2] Gieason F.K., Holmgren A. FEMS Microbiol. Rev. 54: 
271 -297(1 988), [3] Holmgren A. J. Biol, Chem. 284: 13963-1 3966(1 9B9).[ 4] Eklund H,, Gieason F,K., Holmgren A. 
Proteins 11:13-28(1991 ).[ 5]Freedman R B.. Hawkins H.C . MurantS.J . Reid I Biochem. Soc Trans 16:98-99(1988}. 
[ 6] Kivirikko K.l, Myllyla R. ( Pihlajaniemi T. FASE8 J. 3, 1809-1 617(1989).]: 7] Preedman R.B., Hirst T.R., Tuite M.F. 
Trends Biochem, Scr 19:331-336(1994). 

30 [1559] 669 (Transcript fac2} Transcription factor TFIIB repeat signature 

In eukaryotes the initiation ot transcription ot protein encoding genes by polymerase II is modulated by general and 
specific transcription factors. The general transcription factors operate through common promoters elements (such as 
the TATA box). At least seven different proteins associates to form the genera! transcription factors: TFIIA, -I IB, -IID, - 
HE, -IIP. -ilG. and -l!H[1]. Transcription factor MB (TF1I8) plays a central role in the transcription of class II genes, it 

35 associates with a complex of TFHD-flA bound to DNA (DA complex) to form a ternary complex TFIID-IIA-IBB (DAB 
complex) which is then recognized by RNA polymerase 11 [2.3 j. TF11B is a protein of about 315 to 340amino acid 
residues which contains, in its C-terminal part an imperfect repeat of a domain of about 75 residues This repeat could 
contribute an element of symmetry to the folded protein. The following proteins have been shown to be evolutionary 
related to TFliB. - An archaebacterial TFIIB homolog In Pyrococcus woesei a previously undetected open reading 

40 frame has been shown [4] to be highly related to TFIIB. - Fungal transcription factor !IIB 70 Kd subunit (gene 
PCF4/TDS4/BRF1 }[&]. This protein is a genera! activator of RNA polymerase 111 transcription and plays a role analogous 
to that of TFIIB in pol i!l transcription. The centra! section of the repeated domain, which is the most conserved part of 
that domain has been selected as a signature pattern. 

Consensus pattern: GKKR>x(3>-ESTAGN}-xKLIVMYA}-[GSTAK2HCSAVHUVWiHLIVMFYHLIVWAH<3SAHSTAC 

4S { i] Weinmann R, Gene Expr. 2,81-91(1992),] 2] Hawley D, Trends Biochem. Sci. 18,31?-318{1991).[ 3] Ha !., Lane 
W.S.. Reinberg D. Nature 352:689-695(1991 ).[ 4]Ouzoums C, Sander C. Ceil 71 : 189-1 90(1 992 ).[ 5] Khoo B., Brophy 
&.. Jackson SP Genes Dev 8. 2879-2890(1994) 
[1560] 670 (transcritp fact) MADS-box domain signature and profile 

A number of transcription factors contain a conserved domain of 56 amino-acid residues, sometimes known as the 
so MADS-box domain [El] They are listed below. - Serum response factor (SRF) [1]. a mammalian transcription factor 
that binds to the Serum Response Element {SRE}. This is a short sequence of dyad symmetry located 300 bp to the 
5'end of the transcription initiation site of genes such as c-fos - Mammalian myocyte-specific enhancer factors 2A to 
2D (MEF2A to MEF2D). These proteins are transcription factor which binds specifically to the MEF2 element present 
in the regulatory regions of many muscle-specific genes. - Drosophila myooyte-speerfic enhancer factor 2 (MEF2) - 
55 Yeast GRfvVPRTF protein (gene MCM1 ) [2], a transcriptional regulator of mating-type-specific genes. - Yeast arginine 
metabolism regulation protein i (gene ARGR1 or ARG80). - Yeast transcription factor RLM1 - Yeast transcription factor 
SMP1. - Arabidopsis thaliana agamous protein (AG) [3], a probable transcription factor involved in regulating genes 
that determines stamen and carpel development in wild-type fiowers Mutations in the AG gene result in the replacement 
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of the stamens by petals and the carpels by a new flower - Arabidopsts thaiiana homeotic pioteins Apetalai (API), 
Apetala3 (AP3) and Pistillata iP!) which act locally to specify the identity of the floral menstem and to determine sepal 
and petal de\elopmerH [4] - Anlirrhinum rnajus and tobacco horneottc protein deficiens (DEFA) and globosa (GLOT 
[6] Both proteins aretianscnpfton factors involved in the genetic control of flowet development Mutations in DEFA ot 

* GLO cause the transformation of petals into sepals and of stamina into carpels - Arabidopsts thaiiana putative tran- 
sotiption factors AG!. 1 to &G1.6 [6] ■■ Antirrhinum majus moiphogenetic ptotem DE:f : H33 (squamosa) In SRF the 
conserved domain has been shown [1 j to be involved in DIMA-binding and dirnenzation A pattern that spans the com- 
plete length of the domain has been derived The profile also spans the length of the MADS-box. 
Consensus pattern: R~x-[RK3-x{5)-i-x-[DNGSK3-x{3)-[KR]-x(2)"T~[FY]-x-[RK3(3)- x(2)"[LtVM3-x-K(2V-A-x-E-[LIVMj- 

10 [STA]-x-L-x(4i-[LIVM3-x- [LIVM](3)-x{6HL!VMF3-x(2HFY] 

[ 1] Norman C , Runswick M Poiioci R , Treisman R Cell £5 ?89:lQ03(_i988)j 2] Passmore S , Maine G T . Eible R . 
Christ C. Tye B.-K. J. MoL Biol. 204: 593-606(1 988). [ 3] Yanofsky M., Ma H.. Bowman J., Drews G.. Feidmann K.A., 
Meyeiowib E;.M Nature 346.35 -39(19'.?0U 4j Goto h . Meyerowrtz tv M. Genes Oev 3 1 S4S-- 1560(1 994 1.( SjTroebner 
W. Ramirez L . Mote P. Hue I . Huijser P. Loennig W.-E Saedlei H , Sommei H , Schwartz-Somrnei Z EMBO J 

*5 11.4693-4704(1992) [ 6] Ma H., Yanofsky M.F . Meyerowitz EM. Genes Dev. 5:484-495(1991) (E1j 
[1581] 671. Transketolase signatures 

Transketolase <E:C 2 2 11} (TK? catalyzes the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate 
to an aldose receptor, such as nbose 5-phosphate to form sedoheptuiose 7-phosphate and glyceraldehyde 3-phos- 
phate This enzyme, together with transaldolase. provides a link between the glycolytic and pentose-phosphate path- 

so ways TK requires thiamin pyrophosphate as a cotactor In most sources where TK has been purified, it is a homodirnet 
of approximately 70 Kd subunits TK sequences ftom a variety of eukaryotfc and prokaryotic soutces [1.2] show that 
the enzyme has been evolutionary conserved In the peroxisomes of methyiotrophtc yeast Hansenuia polymorpha, 
there is a highly related enzyme dihydroxy-acetone synthase (DHASi (EC 2 2 1.3) (also known as fotmaldehvde tran- 
sketolase}, which exhibits a very unusual specificity by including formaldehyde amongst its substrates. 1-deoxyxyfu- 

25 lose-5-phosphate synthase (DXP synthase) [3] is an enzyme so far found in bacteria (gene dxs) and plants (gene 
CLA1 ) which catalyzes the thiamin pyrophosphoate-dependent acyioin condensation reaction between carbon atoms 
2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xvlulase-5-phosphate <dxp). a piecursor in 
the biosynlhetio pathway to isoprenoids thiamin i\ itamin B1 j, arid pyrido^ol (vitamin B6) D> P synthase is evolutionary 
related to TK Two regions of TK have been selected as signature patterns The fiist, located in the N-terminal section 

30 contains a histidine residue which appears to function inproton ttansfer during catalysis [4j The second located tn the 
centra! section, contains conserved acidic residues that aie part of the active cleft and may participate in substiate- 
binding [4], 

Consensus pattern: R-x(3HLiVMTAHDENQSTHKF3-x{5.6)-[GSM3-G-H-[PLIVMFHGSTA]-x(2HLi!v1C3-[GS 
Consensus pattern. G-jD&QGSAK&Nj-GlPAE^ 

35 [ 1] Abedtnia M Layfield R , Jones S M Nixon PF Mattick J S Btochern Biophys Res Commun 183 1159-1166 
0992} j 2] Fletcher T.S , Kwee i.L... Nakada T, Largman C. Martin BM Biochemistry 31 1 892-1896(1 S92).[ 3| 
Spienyer G A , Schorken Li Wtegett T . Grolle S , De Graaf A A Taylot & V , Begley T P . Bnnger-Meyer S Sahm 
H Proc Nat! Acad Su USA 94 1 2657-1 2862(1 997 t [4] Lindq.'ist V Schneider G . ErmlerU . Sundstioem M EMBO 
J. ti v 237l-2379(19^T. 

■to [1562] 672 Ttansmembrane 4 family signature 

Recenily a number of eukaryotic cell surface antigens have been found to be evolutionary related [1 2 3] The proteins 
known to belong to this family are listed below: - Mammalian antigen CDS (MIC3); A protein involved in platelet activation 
and aggregation ■ Mammalian leukocyte antigen CD37 expressed on B lymphocytes. ■■ Mammalian leukocyte antigen 
CD53 ipX-441, which may be involved in growth tegulation tn hematopoietic ceils ■ Mammalian lysosomal membrane 

■>s protein C.D63 t melanoma-associated antigen ME491, antigen AD1 1 - Mammalian antigen CDS1 ice!! surface protein 
TAPA-1 ), which may play an important role in the regulation of lymphoma cell growth. - Mammalian antigen CD82 
(protein R2: antigen C33, Kangat 1 (KA.11)). which associates with CD4 or CDS and delivers co?ttmulatory signals for 
the TCP>CD3 pathway - Mammalian antigen CD151 iSFA-1, platelet-endothelia! tetraspan antigen 3 (PETA-31) - 
Mammalian ceil surface glycoprotein A15 (TALLA-1: MXS1). - Mammalian novel antigen 2 (NAG-2). - Human tumor- 

50 associated antigen CO-029 - Schistosoma mansoni and [aponiaim 23 Kd surface antigen (SM23 / SJ23) These pro- 
teins shaie the following c ha lacteri sties, they all seem to be type 111 membrane ptotems (type 111 proteins are mtegial 
membrane proteins that contain a N-termma! membrane-anchoring domain which is not cleaved during biosynthesis 
and which functions both as a translocation signal and as a membrane anchor); they also contain three additional 
ti a ns membrane regions al least seven conserved cysteines residues, and are ot approximately the same size (218 

55 to 284 residues) These proteins are collectively Know as the 'transmembrane 4 super farmh ' <TM4 ; because they span 
the plasma membiane four times A schematic diagram of the domain structure of these proteins isshown below +- 

+ + +„„+ + + -+ + — + jj TMa j Extra j TM2j Cyt | TM3 ) Extracellular [ TM4 j 

Cyt| +-+ + + — c C + CC C C- -+ C — + cyf cytepiasmic domain TMa 
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transmembrane anchor TMI! to TM4 transmembrane regions^ to 4 'C' conserved cysteine '*' position of the pattern 
A _x nerved region th=rt includes Ko internes and skeins to be kxat^d in a bhort cytoplasmic loop between to*o 
transmembrane domains has be^n selected a 1 , a signafuie for the^e piotems 

Consensus pattern G-<{55-[LI\ MFj-<{25-[GSAHL!VMF3(2)-G-C->-[G^]-[STA3- ^2V[EG]-Yf2HCWNHUVM3<2i 
s [1]LevyS.. Nguyen V.O.. Andna M.L, Takahashi S.J. Bioi. Chem. 268:14597-14802(1991).] 2] Tomlinson M.G., Wil- 
liams A F Wight MD Eui J tmmuioi 23 1 36-40^1993) ] 3} Barclay A N BikelandEUL Brown M H Bewrs A D 
Davis S J Somo-aC Will urn*. A F Th> j leucocyte antigen faoibook^ Academic Press London ' San Di-»go (19«2i 
[1563] 67xj. Tryptophan synthase alpha chain signature 

'iaptophan synthase cataly.tt-s the last step tn the biosynthesis of tryptophan the tonveision of indoieghceroi phos- 
to f h : ito and st-nne totryptophan : md glyceraldehyde 3-pho^ h : iU [12] If has tv\o functional domains cne fori tu : ' aid of 
cleavage of indoluglyi-ero! phosphate to indole a ndgho^t aldehyde 3-phos.phate- and the other lor thi» synthesis of 
tryptophan tromtndole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains are fused together on a single multifunctional protein. A conserved region 
that contains thtee ronserved a:idit lesidues nos been sele:ted as a signature pattern for the ulpha chum The fust 
*s and the third acidic residues are belies'ed to serve as proton donors/acceptors in the enzyme's catalytic mechanism 
Consensus pattern: [LiVM3-E-[LiVM]-G-x(2)-[FYCHST]-[DEHPAHLiVMY3- [AGLI]-[DE3-G 

[ \) Crawfoid I P Annu R«?\ Microbiol 43 Se^-eOCVISSff) \ 2] Hyde C C , M les E VV Bio/Technology 8 27-32t !990 
{ 3] Berlyn M 8 Last P L Fink G R Pioc Natl Acad Set USA 36 4604-460^19891 
[1564] f74 Triptophan synthase beta chain pyrtdo<al-phosphate attachment site 

so Tryptophan ^ntha^-i o : it : ilyzes ttn» la&t slef in ttn» biosynthesis of fiyf tophan the conversion of indolegly^eiol phos- 
phate and serine totryptophun ano glyreialdehyde ^-phosphate [1 2] It has two functional domains one U*i the aldol 
cleavage ot indoleghj cerol phosphate to indole andglyceialdefyde 3-phosphate and the otner fot the synthesis of 
tryptophan tromindole and serine. In bacteria and plants [3], each domain is found on a separate subunit {alpha and 
hs;ti chains) whil- 1 in fungi th> j two domains atefused iogethei on a smo> multifunctional protein The b<=>fa chain ot 

25 the enzyme requires pyndoxai-phosphate as a cefaclor. The pyndoxai-phosphate group is attached to a lysine residue. 
The region around this lysine residue also contains two histidine residues which are part ot the pyndoxai-phosphate 
binding site The hignatuie pahem for the tivptophansvnthase fceta chain 15. derived from that conserved region 

- Consensus pattern: [L!VM]-x-H-.x-G~[STA3~H~K.-x-N [K is the pyndoxai-P attachment site] 

[ I] Crow fold f P Annu Micmbiol 43 567-600(19895 [ 2] H\ie C C , Miles E W Bio/Te:hnology 8 27-~i2( 1 990' 
[ 3) Berlyn M.8.. Last R.L., Fink G.R. Proc, Natl. Acad. Sci. U.S.A. 86:4804-4808(1989}. 
[1565] C75 bfiinf pftttabeb tiypbin family a^tivt sites 

The ctt-tlytic activity tfthe serine pioteases fiointhe tivpsm family it piovided hx a charge rtrU\ s\stem involving an 

35 isparfio icid residue hvdrog^n-bonded to a histidine which it^elt is h^dtogeri-bonded to a serine The sequences in 
the vicinity of the active site serine and histidine residues are well conserved in this family of proteases [1 ]. A partial 
list of proteases known to belong to the trypsin family is shown below. - Acrostn. - Blood coagulation factors VII. IX. a. 
XI and Xll, thrombin, plasminogen, and protein C. - Cathepsin G. - Chymotrvpsins. - Complement components C1r. 
Cis. CI. and complement factors E. D and I - Complement-activating component of RA-reactt^e factot - Cytotoxic 

40 cell proteases fgransymes A to H) - Duodenase I. - Elastases t, 2. 3A. 3B (protease E). leukocyte (medullasim - 
Enterokinase (EC 3 4 219) (enteropeptidase). - Hepatocyte growth factor activator. - Hepsm - Glandular (tissue} ka- 
llikreins (including EGP-binding protein types A, B, and C, NGF-gamma chain, gamma-renin. prostate specific antigen 
(PSA) and tontn). ■■ Plasma kaiiikretn ■■ Mast cell proteases (MCP) 1 fchvmase) to 8. ■■ Myeloblasts (proteinase 3} 
(Wegener's autoantigen) ■ Plasminogen activators uirokmase-type. and tissue-type} ■ Trypsins I. II. 111. and IV. ■■ Tryp- 

■*s tases - Snake venom proteases such as ancrod, batroKobin : cerasiobin, flavoxobin. and protein C activator - Colla- 
genase from common cattle grub and coliagenolytic protease from Atlantic sand fiddler crab. - Apolipoprotem(a). - 
Blood fluke cercarial protease ■■ Drosophila trypsin like proteases: alpha, easter. snake-locus. ■ Drosophiia protease 
stubbie (gene sb}. - Major mite fecal allergen Der p Hi Ail the above proteins belong to family 31 in the classification 
of peptidases[2,£.1.3 and originate from eukaryotic species It should be noted thatbacteriai proteases that belong to 

so family S2A are similar enough in the regions of the active site residues that they can be picked up by the same patterns 
These proteases are listed below ■ Achromobacter iyticus protease i. ■■ Lysobacter aipha-lytic protease. ■ Streptogrisin 
A and B (Streptomyces proteases A and 8). - Streptomyces grtseus glutamyl endopeptidase I! - Streptomyces fradiae 
proteases 1 and 2. 

Consensus pattern [LIVM]-[ST]-A-[STAG]-H-C [hi is the active site residue] 
ss Consensus pattern: [DNSTAGCHGSTAP!MVQH}-x(2}-6-[DE3~S-G~[GSHSAPHV]4L!VMFYWHHL!VMFYSTANQH3 
[S is the active site residue] 

f t] Brenner S. Nature 334 52?5-530( 19*58}.] 2] Rawlings N.D . Barrett A J. Meth Enzymol. 244' 19-6 1(1 994 ).[E 1] 
[1566] 676. (tsp) Thrornbospondin type 1 domain 
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[1567] [1] Bork P, FEBS iett 1993;327; 125-1 30 

[1568] 677. Tubulin subunits alpha, beta, and gamma signature 

Tubulins [1.2], the major constituent of microtubules are dimene. proteins which consist of two closely related subunits 
(alpha and beta). Tubulin binds two molecules of GTP at two different sites (N and E). At the E (Exchangeable) site, 

s GTP is hydrolyzed during incorporation into the microtubule. Near the E site is an invariant region rich in glycines which 
i5= found in both chains andwhich is now j3j said to control the access of the nucleotide to its binding site. A signature 
pattern was developed from this region With the exception of the simple eui<aryotes ; most species express a variety 
of closely related alpha and beta isotypes. In most species there is a third member of the tubulin family: gamma tubulin. 
Gamma tubulin is found at microtubule organising centers (MTOC) such as the spindle poles or the cenrrosome, sug- 

to gesting that it is involved in the minus-end nucleation of microtubule assembly [4]. 
Consensus pattern: [SAG]-G-G-T-G-[SA]-G 

[ 1] Cleveland D.W., Sullivan K.F. Annu. Rev. Biochem. 54:331 -365(1 985).[ 2] Joshi H.C., Cleveland D.W. Cell Motif. 
Cytoskeleton 1 6: 159-1 63(1 990), [ 3] Hesse J., Thierauf M., Ponsfingi H. J. Biol. Chem. 262:15472-15475(1987) .[ 4] 
Joshi H C BioEssays 15 637-643(1993). 

f5 f 1 569] Tubuiin-beta mRNA autoreguiation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own translation product [1], Unpolymerized tubulin sub- 
units bind directly (or activate a factors) which binds co-translationally ) to the nascent N-terminus of beta-tubulin . This 
binding is transduced through the adjacent nbosomes to activatean RNAse that degrades the poiysome-bound mRNA 
The recognition element has been shown to be the first four amino acids of beta-tubulin. Met-Arg-Glu-lle. Mutations 

so to this sequence abolish the autoreguiation effect (except for the replacement of Glu by Asp): transposition of this 
sequence to an internal region of a polypeptide also suppresses the autoregulatory effect. 
Consensus pattern: <M-R-[DE)-[tL] 

[ t] Cleveland D W Trends Biochem. Sci 13 339-343{ 1988). 

[1570] 678. (iRNA-synt 2c) Aminoacyi-transfer RNA synthetases class-!! signatures Aminoacyl-IRNA synthetases 

25 (EC 8.11 .-) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as 
the first step in protein biosynthesis, in prokaryotic organisms there are at least twenty different types of aininoacyl- 
tRNA synthetases, one for each different amino acid, in eukaryotes theie are generally two aminoaoyl-tRNA synthetas- 
es for each different amino acid: one cytosolic form and a mitochondrial form While all these enzymes have a common 
function, they are widely diverse in terms of subuntt size and of quaternary structure. The synthetases specific for 

30 alanine, asparagme, aspartic acid, glycine, histidme. lysine, phenylalanine, proline, serine, and threonine are reteired 
to as class-!! synthetases [2 to 8] and probably have a common folding pattern in their catalytic domain for the binding 
of ATP and amino acid which is different to the Rossmann fold observed for the class ! synthetases [7].CIass-ll tRNA 
synthetases do not share a high degree of similarity, however at least thtee conserved regions are present [2.5.8]. 
Signature patterns have been derived from two of these regions 

35 Consensus pattern: [FYH].R.x.[DE]-x(4 > 12HRH3-xf3)-F-x{3HDE]- 

Consensus pattern: [GSTALVF]^DENQHRKP}~[GSTA]"ELtVMF]-[OE)~R~[LtVMF}-x-[LtVMSTAGHLiVrVtFY]- 
[1571] [ 1] Schimrne! P Annu Rev Biochem 56i25-158(1987i [ 2] Delarue M , Moras D. BioEssays 15:675-687 
(1993).[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3{1991).[ 4] Ntage! G.M., Doolittle R.F, Proc. Natl. Acad. Sci. U.S. 
A. 88.8121-8125(1991). [S] Cusack S. t Haertiein M.. Leberman R. Nucleic Acids Res. 1 9.3489-3498(1 991 ).[ 6] Cusack 

40 s. Biochimie 75:1077-1081(1993). [ 7] Cusack S.. Berthet-Colominas C, Haertlein M., Nassar N., Leberman R. Nature 
347 249-255(1 990).[ &\ Leveque F , Plateau P., Dessen P., Bianquet S Nucleic Acids Res. 18:305-31 2(1 090 1. 
[1572] 679 UBA-domain 

[1573] The UBA-domain (Ubiquitin associated domain) is a novel sequence motif found in several proteins having 
connections to ubiquitin and the ubiquitination pathway. The structure of the UBA domain consists of a compact three 
45 helix bundle [1]. Number of members: 64 

[1574] [1 ] Structure of a human DNA repair protein UBA domain that interacts with HIV-1 Vpr. Dieckmann T, Withers- 
Ward ES. Jarosmski MA, Liu CF. Chen IS. Feigon J, Nat Struct Bio! 1998:5:1042-1047. 
[1575] 680. UBX domain 

Domain present in ubiquitin-regulatory proteins. Present in FAF1 and Shplp. Number of members: 19 
so [1] The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway. Hofmann K, 
Bucher P; Trends Biochem Set 1996;21:172-173 

[1578] 681. (UCH) Ubiquitin carboxyl-termma! hydrolases family 1 cysteine active site 

Ubiquitin earboxyMermirtal hydrolases (UCH) (deubiquftinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C -terminal glycine of ubiquitin. These enzymes are involved in the processing of 
55 poly-ubiquitin precursors as well as that of ubiquinaied proteins. There are two distinct families of UCH. The first class 
consist of enzymes ofabout 25 Kd and is currently represented by. - Mammalian isozymes 11 and L3 - Yeast YUH1 
- Drosophila Uch One of the active site residues of class- 1 UCH [3] is a cysteine A signature pattern has been derived 
from the region around that residue. Consensus pattern- Q->:(3i-K4SA]-C-G-<{3)-[LIVM]!2}-H-[SA}-[LIV'Mj-[SA] [C is 
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the active site residue 

f 1] Jentsch S . Seufert W. Hauset H -P. Biochim Biophys. Acts 1089:127-139(1991) [ 2] D'andrea A.. Pellman O. Crit. 
Rev Biochem. Mot Bioi 33 337-352(1 998).[ 3] Johnston S C , Larsen C.N , Cook W J . Wilkinson K D., HillC P EMBO 
J. 18; 3787-3796(1 997). [4] Rawlings N.D.. Barrett A. J. Meth. Enzymol. 244:461-486(1994). 

s [1577] 682 Ubiquitin carboxyhterminal hydrolases family 2 signatures (UCH-1) 

Ubiquitin carboxyi-terminal hydrolases i'UCH) ideubiquitmating enzymes.) [1.2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C -terminal glycine: of ubiquitin. These enzymes sue involved m the? processing of 
poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class, consist of large proteins (800 to 2000 residues) and is currently represented by. ■■ Yeast UBP1. UBP2, UBP3. 

10 UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBPT1. UBP12, UBP13. UBP14, UBP15 and UBP16. - Human 
tre-2. - Human isopeptidase T. - Human isopeptidase T~3. ~ Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E - Caenorhabditis elegans 
hypothetical protein R10C--11 3 ■ Caenorhabditis. elegans hypothetical protein K02C4 3 These proteins only share two 
regions of similarity The first region contain&a conserved cysteine which is probably implicated in the catalytic mech- 

*s anism The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regions have been developed. 

Consensus pattern: G-fLIVMPYj-xn ,3)^[AGCHNASM]-x-C-[FYWl-{LIVMC3^[NST]-[SACV3--x--fLIVMS|-.Q [C is the puta- 
tive active site residue] 

Consensus pattern Y-xl.-x-fSAGHUVMFTHQ-H-^G-y^^S-G-H-Y [The two H's are putative active site residues] 
so [ 1] Jentsch S . Seufert W., Mauser H.-P Biochim. Biophys Acta 1089 127-1 39(1 991 ).[ 2] D'andrea A , Pellman D Crtt 
Rev Biochem Moi. Biol 33 337-352(1 998 ).[ 3] Rawlings N D., Barrett A J Meth Enzymol 244:461-486(1994} 
[1578] 683. Ubiquitin carboxyl-termmal hydrolases family 2 signatures (UCH-2) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitmating enzymes) [1,2] are thiol proteases that recognize and 
hydfoiyze the peptide bond at the C-termina! glycine of ubiquitin These enzymes are involved in the processing of 

25 poiy-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeprotems (800 to 2000 residues) and is currently represented by. ■ Yeast UC3P1. UBP2, UBP3. 
UBP4 (or DOA4/SSV7), UBP5 : UBP7 r UBP9, UBP10, UBP11, UBP12, UBP13, UBP14, UBP15 and UBP16. - Human 
tre-2 - Human isopeptidase T. - Human isopeptidase T-3 - Mammalian Ode-1 - Mammalian Unp - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 

30 hypothetical protein R10E11 3 - Caenorhabditis elegans hypothetical protein K02C4 3 These proteins only share two 
regions of similarity. The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 
anism. The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved tegions have been developed. 

Consensus pattern: G-[L!VMFY3-x{1 ,3HAGC]-fNASM]-x-C-[FW]-[LIVMCHNSTHSACV|-.x-.[LIVMS3--Q [C is the puta- 

35 tive active site residue] 

Consensus pattern: Y--y-4.--y--[SAG3-[tJVMF : Tj-x{2>-H-x--G-x{4.6>-G-H-Y [The two H's are putative active site residues] 
[ 1] Jentsch S , Seufert W. Hausei H -P. Biochim Biophys Acta 1089' 127-1 39(1 991 ) [ 2] D'andrea A . Pellman D. Crit 
Rev. Biochem. Mo!, Biol. 33:337-352(1 998). [ 3] Rawlings N.O., Barrett A.J. Meth. Enzymol. 244:461-488(1994). 
[1579] 684 UOP-glycosyltransferases signature 

40 UDP glyeosyltransferases (UGT't are a superfamily of enzymes that catalyzes the addition of the glycosyl group from 
a UTP-sugar to a small hydrophobic molecule This iamiiy currently consist of' - Mammalian UDP-giucoionosyi trans- 
ferases (UDPGT) [1.2], A large family of membrane-bound microsomal enzymes which catalyze the transfer of glu- 
curonic acid to a wide variety of exogenous and endogenous lipophilic substrates These enzymes, are of major im- 
portance in the detoxification and subsequent elimination of xenobiotic.s such as. drugs and carcinogens ■■ A large 

■*s number of putative UDPGT from Caenorhabditis eiegans. - Mammalian 2-hydroxyacyisphingosine 1-bela-galactosyl- 
transferase [3] (also known as UDP-galactose-cerainide galactosyltransferase) This enzyme catalyzes the transfer 
of galactose to ceramide. a key enzymatic step in the biosynthesis of galactocerebrosides. which are abundant sphtn- 
goiipids of the myelin membrane of the central nervous system and peripheral nervous system - Plants flavonol Of 3)- 
glucosyltransferase. An enzyme [4] that catalyzes the transfer of glucose from UOP-glucose to a flavanoi. This reaction 

so is essential and one of the last steps in anthocyanin pigment biosynthesis - Baculoviruses eedystetoid UDP-glucosyl- 
transferase (EC 2.4.1.-) [5] (egf). This enzyme catalyzes, the transfer of glucose from UDP-glucose to ectysteroids 
which are insect molting hormones. The expression of egt in the insect host interferes with the normal insect develop- 
ment by blocking the molting process. - Prokaryctic zeaxanthin giucosyl transferase {gene crtX), an enzyme involved 
in carotenoid biosynthesis and that catalyses the glycosylate reaction which converts zeaxanthin to zeaxanthm-beta- 

55 diglucoside -Stfeptomyces macrolide giycosyltfansferases [6]. These enzymes specifically inactivates macfoiide ani- 
tibiotics via 2'-G-glycosylation using UDP-glucose. These enzymes share a conserved domain of about 50 amino acid 
residues locatedin their C-terminai section and from which a pattern has been extracted todetect them. 
Consensus pattern: EFW3-x(2)-Q-x(2)-[LlVMYAHL!MV]-x(4,6)-[LVGAC]- [ LVF YA] -[ L I V M F ]-[ STAGC M ]-[ H M Q]- 
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[STAGC3-G-x(2)-j;STAG3~xf3HSTAGL3- [LiVMFA]^{4}-[PQR3-[L!VMT3-x{3)-[PA3-x{3)-[D£S3~[QEHrj] 
[ l]Dutton G.J. (in) Glucoronidation of drugs and other compounds. Distort G.J. , Ed., pp 1-78, CRC Press, Boca Raton, 
(1980) [ 2] &urche--ll B . Nebe-rt D W, Nelson D R . Bock K W., lyanagi T., Jansen PL , Lancet D , Mulder G J., Chow- 
dhury J.R., SiestG., TephlyT.R., Mackenzie PA. DNA Cell Biol. 10:487-494(1991 ).[ 3]Schulte S„ Stoffe! W. Proc, Natl. 
s Acad, Sci. U.S.A. 90:10266-10269(1993) [ 4] Furtek D., Sehiefelbein J.W., Johnston R. Nelson O.E. Jr. Plant Moi, Bio!. 
11 -473-481(1988).[ 5] O'Reilly D.R., Miller L.K. 

Science 245 1110-1112(1989).[ 6] Hernandez C, Olano C : Mendez C, Saias J A. Gene 134 139-140(1993). 
[1580] 685. UDP-giucose/GDP-mannose dehydrogenase family 

[1581] The UDP-glucose/GDP-mannose dehydrogenase's are a small group of enzymes which possesses the 
to ability to catlyze the- NAD-dependent 2-fold oxidation of an alcholcl to an acid without the release of an aldehyde 
intermediate [2]. Number of members: 55 

[1582] [1] Purification and characterization of guanosine diphospho-D-mannose dehydrogenase A key enzyme in 
the biosynthesis of alginate by Pseudomcnas aeruginosa Rcychcudhury 8. May TB. Gil! JF Singh SK, Femgold DS, 
ChakrabartyAM. J Bio! Chem 1989.264.9380-9335. [2] Properties and kinetic analysis of UDP-glucose dehydrogenase 
*s from group A streptococci. Irreversible inhibition by UDP-chloroaeeto!. Campbell RE, Saia RF, van de Rijn I, Tanner 
ME; J Bio! Chem 1997;272:3416-3422. 
[1583] 686. Uracil-DNA glycosylase signature 

Uracil-DNA glycosylase (EC 3.2.2.-} (UNG) [1] is a DNA repair enzyme that excises uracil residues from DNA by 
cleaving the N-glycosylic bond. Uracil in DNA can arise as a result of inisincorportation of dUMP residues by DNA 

so polymerase or dearninatiort of cytosine. The sequence-- of uiacil-DNA glycosylase- is extremely well conserved [2] in 
bacteria and eukaryotes as well as in herpes viruses More distantly related uracil-DNA glycosylases are also found 
in poxviaises [3].ln eukaryotic ceils, UNG activity is found in both the nucleus and the mitochondria. Human UNG1 
protein is transported to both the mitochondria and the nucleus [4]. The N-terminal 77 ammo acids of UNG1 seem to 
be required for mitochondrial localization [4], but the presence of a mitochondrial transitpepiide has not been directly 

25 demonstrated. As a signature for this type of enzyme, the most N-termina conserved region has been selected. This 
region contains an asparttc acid residue which has been proposed, based on X-ray structures Jo, 6 j to act as a general 
base in the catalytic mechanism. 

Consensus pattern: [KRHL!V3-|LIVCHLiVM3-x-6-JQl]-D-P-Y JD is the active site residuej- 

1 1] Sartcar A.. Sancar G.8. Annu. Rev. Biochem, 57:29-67(1 988). [2] Olsen L.C., Aasland R., WittwerC.U., Krokan 
30 h £ , He-Hand D.E EM BO J 83121-3125 (1989i.[ 3] Upton C, Stuart D T , McFadden G Proc Natl Acad Sci U S 
A, 90.45 18-4522f 1993} [ 4] Slupphaug G . fviarkussen F-H . Olsen L C . Aasland R„ Aarsaether N . Bakke O , Krokan 
H.E., Helland D.E. Nucleic Acids Res. 21:2579-2584(1993}.[ 5]SawaR„ McAuley-HeebtK., Brown T, Pearl L. Nature 
373:487^93(1995).[ 6] Mo! CD., Arvai A.S., Slupphaug G., Kavli B., Alseth !., Krohan H.E., Tainer J. A. Cell 80:869-878 
(1995). J 7] MullerS.J., Caradonna S. Biochim, Biophys. Acta 1088:197-207(1991) J 8] Meyer-Siegler'kTMa'ura'aX, 
35 Seal G.. Wurzer J., Dene! J.K : Sirover M.A. Proc Natl. Acad. Sci U S.A 88 8460-8464(1991 ).[ 9] Mulier S.J : Cara- 
donna S. J. Biol. Chem 268. 131 0-1 31 9f 1993} [10] Barnes D.E., Ltndahi T., Sedgwick B. Curr. Opin. Cell Bio! 5:424-433 
{1993}. 

[1584] 687. Uncharac.tenzed protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1 j to share regions ofsimilanties. 

Yeast chromosome II hypothetical protein YBL036c - Caenorhabditis elegans hypothetical protein F09E5.8. - 
Bacillus subtiiis hypothetical protein yimE. - Escherichia colt hypothetical protein yggS and HI0090, the correspond- 
ing Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0395. - Mycobacterium tubercu- 
losis hypothetical protein MtCY270 20. ■ Synechocystis strain PCC 6303 hypothetical protein slr0556 - A Pseu- 
45 domonas aeruginosa hypothetical protein in pilT 5' region. - A Vibrio alginolyticus hypothetical protein in pilT 5're- 

gion These are proteins of from 25 to 30 Kd which contain a number of conserved regions The best conserved 
region which is located in the first third of these proteins has been selected as a signature pattern 

Consensus pattern: [FW]~H-{FMHiV]-G-x-[LIV]-Q-x-JNKR}-K-x(3HLiV] 
so [i] Bairoch A.. Rudd K.E. Unpublished observations (1996). 

[1585] 688 Unchatac.terized protein family UPf : 0003 signature 

The following unchaiactenzed proteins have been shown [1 j to share regions of similarities. - Escherichia coll protein 
aefA. - Escherichia coii hypothetical protein yggS. - Escherichia coli hypothetical protein yjeP and HIQ195.1, the cor- 
resj ending Haemophilus influenzae protein - Escherichia coli hypothetical protein ynal - Bacillus subtiiis hypothetical 
ss protein yhdY - Helicobacter pylon hypothetical protein HPG415. - Synechocystis strain PCC 6803 hypothetical protein 
slr0639 - Archaeoglobus fulgidus hypothetical protein AF1546 - Methanococcus jannaschii hypothetical protein 
MJ0170. - Methanococcus jannaschii hypothetical protein MJ1143.The size of these proteins range from 30 to 120 Kd. 
They all contain a number of transmembrane regions The best conserved region which is located in and just after the 
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last potential transmembrane region has been selected as a signature pattern. 

Consensus pattern: G-[STiF3-V-x{2)-[LIVM]-x(6HLIVMF]-x(3HDQ]-x(3HL!V}- x-[LIV>P-M-x(2}-[L!VMFHL!VFSTA3-x 
{5}-N 

[ 1) Bairoch A. Unpublished observations (19971 

[1586j 689 ^characterized protein family UPF0004 signature 

The following uncharacterized proteins have been shown ]1 j to shate regions of similarities ■■ Escherichia coli hvpo- 
theticj! protein yliG - Escherichia coil hypothetical protein yieA and H10019 the corresponding Hjen ioptiiii.it> influenzae 
protein. - Bacillus subtiiis hypothetical protein yqeV. - Helicobacter pyion hypothetical protein HP0269. - Helicobacter 
pylon hypothetical protein HP0285. •• Mycoplasma iowae hypothetical protein in 183 RNA Sregion. - Mycobacterium 

10 leprae hypothetical protein B2235_C2_195 - Pseudornonas aeruginosa hypothetical protein in hernL 3'region - Syn- 
echocystis strain PCC. 6803 hypothetical protein s!rQ082 - Synechocystis strain PCC 6803 hypothetical protein 
S110998. - Methanococcus jannaschii hypothetical protein MJ0865. - Methanococcus jannaschii hypothetical protein 
MJ086'.' ■■ Caenorhabditis elegans hypothetical protein F2?B6.5 The size of these proteins range from 47 to 6t Kd 
They contain sty consei^ed cysteines, thiee of which ate clustered in a region that can be used as asignatuie pattern 

*5 Consensus pattern: [LIVMJ-x-ILIVMTj^J-G-C^SJ-C-lSTANHFYJ-C-x-ILiVMj^J-G 
[1] Bairoch A. Unpublished obsea'ations (1997). 
[1587] 690. Unchatacterrzed piotetn familv UPF0005 signature 

The following proteins seems to be evolutionary related [ij - Mammalian protein TEGT ^ Testis Enhanced Gene Tran- 
script) - Escherichia coii hypothetical protein yccA and H10044, the corresponding Haemophilus influenzae protein, - 
so A probable Pseudornonas aeruginosa orthoiog ot yccA These are proteins ot about 25 Kd which seem to contain 
seven transmembtanedomains A signatute pattern that cot responds to a legion that starts with the beginning of the 
third tiansmembrane domain and ends in the middle of the fouith one has been developed 
Consensus pattern: G-[LIVM](2HSA]-x{5.8M3-x{2HLfVM^ 
[L!VM]f'2)-F 

25 [1] Walter L, Marynen P., Szpirer J., Levan 6., Guenther £. Genomics 28:301-304(1995). 
[1588] 691 ^characterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBL055e - Escherichia coir hypothetical protein ;,cfH 3nd H1G454. the corresponding Haemophilus 
influenzae protein - Escherichia coli hypothetical protein yigVV - Escherichia coli hypothetical protein yyV and HI0081, 
30 the corresponding Haemophilus influenzae protein. - Bacillus subtiiis hypothetical protein yabD. - Haemophilus influ- 
enzae hypothetical protein H! 1654 - Mycoplasma genitalium hypothetical piotetn MG009 These are proteins of from 
24 to 47 Kd which contain a number of conserved regions. They can be picked up in the database by the following 
patterns. 

Consensus pattern: [LIVMFYl(2VD^[STA3-H-x-H-fLIVMF]-{DN 
35 Consensus pattern: P-[LiVM3-x-[L!VM]~H~x-R.x.[TA3-x-[DE 

Consensus pattern: [LVSAj-[LiVA]-x{2HLiVMHPS]-x(3)-L-[LIVMHLIVMS]-E"T- O-x-P 
[ 1] Bairoch A.. Rudd K.E. Unpublished observations (1995). 
[1589] 692 Unchatacterrzed piotetn family UPF000: signature 

The following proteins seems to be evolutionary related [1] ~ Escherichia coli hypothetical protein ygbP and Hi0672 
40 the corresponding Haemophilus influenzae protein. - Bacillus subtiiis hypothetical protein yacM. - Mycobacterium tu- 
berculosis hypothetical protein MfCY06G11 29c - Synechocystts strain PCC 6803 hypothetical protein slr0951 - A 
Rhodobactei capsulatus hypothetical protein in mfR3 5'region £<ceptforthe Rhodobactet protein which contains a 
C -terminal extension ail these proteins have from 225 to 236 ammo acids They are hydrophtlic proteins that can be 
picked up in the database by the following pattern, 
4S Consensus pattern: V-l-|IV>H-0-[GAJ-A-R 

[ 1] Bairoch A. Unpublished observations (1997). 

[1590] 693 Uncharacterized protein family UPF00! 6 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities' - Yeast chromosome II 
hypothetical protein YBRG02e - Yeast chromosome XII! hypothetical protein YMR101c - Escherichia coli hypothetical 

so protein yaeU and HI0920. the corresponding Haemophilus influenzae protein. - Helicobacter pylon hypothetical protein 
HP1?2! ■• Mycobacterium leptae hypothetical ptotein B1937...F2.65 ■■ A Corynebactetium glutamicum hypothetical 
protein in aroF 3'region - A Stieptomyces fiadiae hypothetical protein in ttansposon Tn4556 - Synechocystts sttain 
PCC 8803 hypothetical protein slIOSOS. - Methanococcus jannaschii hypothetical protein MJ137si.These are proteins 
of about 26 to 40 Kd whose central region is weii conserved They can be picked up in the database by the following 

55 pattern. 

Consensus pattern EDEHL!VMF3(3)-R-T-{SG]-G-y(2^R-x-S-x-[FYHLiVM3f2i-W-Q- 

f l] Wolfe K H Lohan A,J E /east 10 S41-S46( 1994) 

[1591] 694 Uncharacterized protein family UPF0016 signature 
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The following uncharactenzed proteins have been shown [1] to share regions of similarities: - Yeast hypothetical protein 
YBR187w. - Fission yeast hypothetical protein SpA.Ci7G8.08c. - Mouse protein pFT27. - Syneehocystis strain PCC 
6803 hypothetical protein -5110615. These are- hydrophobic proteins of 200 to 320 amino acids that stem to contain six 
or seven transmembrane domains. A conserved region which seems, in the eukaryotic proteins of this family to directly 
s follow the second transmembrane- domain has been selected as a signature pattern. 
Consensus pattern E-{l..iVM}-G-D-K-T-f : ^L.IVMr : ](2}-A 
[ 1] Bairoch A. Unpublished observations (1996). 
[1592] 635. Uncharactenzed protein family UPF0021 signature 

The following uncharactenzed proteins have been shown [1] to share regions of similarities: - Yeast chromosome VI! 
to hypothetical protein YGL211w. - Dictyostelium discoideum protein veg136. - Methanococcus jannaschti hypothetical 
proteins MJ1157 and MJ1478. These are proteins of from .300 to 36o residues They can be picked up in thedatabase 
by the following pattern which is located in their N-terminaisection. Consensus pattern: C-K-x(2VF-x(4)-E-x{22,23VS- 
G--G-K--D 

[ 1] Bairoch A. Unpublished observations (1997). 

is [1 5933 6 96. Uncharactenzed protein family UPF0023 signature 

The following uncharactenzed proteins have been shown [1 j to share regions of similarities: - Mouse protein 22A3. - 
Yeast chromosome XII hypothetical protein Yl.R022c. ■ Caenorhabdrtis elegans hypothetical protein W06E 11.4. -Meth- 
anococcus jannaschfi hypothetical protein MJ0592 These are hydrophilic proteins of about 30 Kd They can be picked 
up in the database by the following pattern. 

so [1594] Consensus pattern D->-D-E-[LiV}-L-xf4)-V-F-x.;3)-S-K-G- 
[1595] [1] Bairoch A Unpublished observations f 1997) 

[1596] 697. Uncharactenzed protein family UPF0024 signature. The following uncharactenzed proteins have been 
shown [1]to share regions of similarities: - Escherichia cofi hypothetical protein ygbO and Hi07G1 f the corresponding 
Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0926 - Yeast chromosome XV hypothef- 
25 teal protein YOR243c. - Caenorbabditis eiegans hypothetical protein B0024.11. - Methanococcus jannaschii hypothet- 
ical proteins MJ0588 and MJ1 384. These are hydrophilic proteins of from 39 to 77 Kd. They can be picked up in the 
database by the following pattern. 

[1597] Consensus pattern G-x-K-D-[KR]-x-A-[LV3-T-x-Q-x-[L!vT]-[SGGj- 
[ 1] Bairoch A. Unpublished observations (1997). 

30 [1598] 698 Uncharactenzed protein family UPF0025 signature 

The following uncharactenzed proteins have been shown [ 1 ] to share regions ot similarities. - Escherichia coli hypo- 
thetical protein yfcE, - Bacillus subtil is hypothetical protein ysnB. - Mycoplasma genitalium and pneumoniae hypothet- 
ical protein MG2Q7. - Methanococcus jannaschii hypothetical proteins MJQ823 and MJ0936. These are hydrophilic 
proteins of about 20 Kd They can be picked up in thedatabase by the following pattern. 

35 Consensus pattern: D-V~[LIV]-x(2)-G-H-{ST]-H-x{12HLIVMF]-N-P-G 
[ 1] Bairoch A Unpublished observations (1997). 
[1599] 699 Uncharactenzed protein family UPF0029 signature 

The following uncharactenzed proteins have been shown [1] to share regions of similarities: - Yeast chromosome III 
hypothetical protein YCRS9c - Yeast chromosome IV hypothetical protein YDL177C - Escherichia coii hypothetical 
40 protein ytgZ and H10722, the corresponding Haemophilus influenzae protein. - Bacillus subtiiis hypothetical protein 
yvyt. - A T hetmus aquaticus hypothetical protein in poi S'region. 'These proteins can be picked up in the database by 
the following pattern. 

Consensus pattern: G-x{2)-(LiVM](2)-x{2KLiVM3-x(4 HL!VM]-x(5HL!VM](2)-x- R-[FYWj{2)-G-G-x(2HUVM]-G 
[ 1] Koonin E.V., Bork P., Sander C. EM BO J. 13:493-503(1994) 

■>s [1600] 700. Uncharactenzed protein family UPF0030 signature 

The following uncharactenzed proteins have been shown [1] to be highly similar: - Yeast chromosome V! hypothetical 
protein YFI..060c. - Yeast chromosome Xlli hypothetical protein yMR09Sc. ■■ Yeast chromosome XIV hypothetical protein 
YNL334c - Bacillus subtiiis hypothetical protein yaaE - Haemophilus influenzae hypothetical protein H1 1648 - Meth- 
anococcus jannaschii hypothetical protein MJ 1661. These are hydrophilic proteins of about 19 to 25 Kd They can be 

so picked up inthe database by the following pattern 

Consensus pattern: [GAH.-T-EUVp-G-G-E- .S-T-jSTA] 

[ 1] Bairoch A. Unpublished observations (1997). 

[16013 701 . Uncharactenzed protein family UPF0032 signature 

The following uncharactenzed proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
55 thetical protein yigU and H101S8, the corresponding Haemophilus influenzae protein. - Bacillus subtiiis hypothetical 
protein ycbl - Mycobacterium tuberculosis hypothetical protein MtCY49.33c and U2126A. the corresponding Myco- 
bacterium leprae protein. - Syneehocystis strain PCC 6803 hypothetical protein s!!0194. - Odontelia sinensis and Por- 
phyra purpurea chiioplast hypothetical protein ycf43 These proteins have from 245 to 317 amino acids and seem to 
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contain at least su or seven ttansmembrane regions A conserved region located tnthe centtal sectionotthese pioteins 
has been developed as a signature pattern.. 

Consensus pattern: Y-x(2)-F-[LlVMA]{2)-x-L-x(4)-G-x{2)-F-[EQ3-[LlVMF3-P-[LIVM]-[ 1] Bairach A.. Rudd K..E. Unpub- 
lished observations i 1996). 

s [1602] Uncharactonzed protein family UPF00^4 signature 

The following uncharactenzed proteins have been shown j1] to share regions of similarities' - Escherichia coil hypo- 
thetical protein ^hdG and H10°"9 th> j ootrespondina Hae mophilui, influenzae protein - Esrhe-tichia coll hypothetical 
protein yjbN and HIQ634. the corresponding Haemophilus influenzae protein. - Escherichia co!i hypothetical protein 
yohi and HI0270. the corresponding Haemophilus influenzae protein. - Bacillus subtil is hypothetical protein yacF. - 

to Rhode Ixk ter caf.sulatui, ptotem nifRo and related ( rcUms in Azospinilum bra^ilense and Rhiixbiuin leguminoijanitu 
- 8yneohO(.ystis strain PCC ^80? hvpothetir il protein s!iOB44 - Synechor ystis strain PCC 6o03 hypothetic i! protein 
s!!0926. - Caenorhabditis elegans hypothetical protein C45G9.2. - yeast protein SMM1. - Yeast hypothetical protein 
Yl.R401c 'te^st hypothetical piotein \ I. R40Sw ■ \east hypothetical prttein YMl.OQOw Although it h.js been ptoposed 
[2] that Rhodobarterrapstilatus mfR? is a transcriptional regulator pmtein it i^ beliwea that these pioteins tonstittite 

*s a family of enzymes whose active site could include a conserved cysteine which has been used as the central part of 
a signature pattern. 

Consensus pattern: ELiVMHDNGHLIVWiJ-N-x-G-C-P-x(3HLiVWiASQ}-x{5>-G4SAC3 

[i]BairocnA Rudd K E Unpublished obsen.ations (19t'oi [ 1] Foster-Hartnett D Culien PJ Gaobert K h hranz 
RG Mo! MiOiobiol 8 199^1 

20 [16033 ~°3 Unchapioterced protein family UFFOCMS signaHite 

Tne following tint hut amerced proteins have teen shown |1] to shate regions of similanties - Eschenrnia :o!i hypo- 
thetical protein \acE and HI0890 the corresponding Haemophilus influenzae protein - Mycobacteiiurn tubeicuiosis 
h\pothet_Kal piotein Mtcy01B2 23 and O410 th<= coi responding Mvcobactenum leprae piotein - bynechocistis btiain 
PCC B£G3 hypothetical protein ^iiO'53 - Othe-i hypothetic i! proteins fioin A.urornona^ hydtophila Baaeroidus norfo- 

2S sus Neisseria gonorrhoeae Pseudomonas putida Thermus tnermophilus and Xgnthomonas campestrts - Human 
hypothetical protein pOVO ■ /east hypothetical piotein YDR19CC ■ Cat- noihabditis elegans hvpothetioal protein 
T05G? ? These pr< ie-ins- all ocntam m theit N-ternnnal eternity an ATP/GTP-bindmg motif A' iP-loops {^ee 
-■'PDOC0G017 ^ Th« ^i^t; of lhe<,e proteins rangu Horn 200 to 2&0 residues (with the exception of the Mycobacterial 
seouences\*hich aie aie 410 residues long) Aconseved region some CO tesidues away from the ATP-bmding P-ioop 

30 has been developed as a signature pattern. 

Consensus pattern G-y-[LI]-<-R^,2VL^i4VF-<(8HLIV}->(^)-P^-[L1\ ]-[ l]Ruddh E , Bairoch A Unpublished obser- 
vations (1997). 

[16043 ~0A Ubiquitin-v.onjugatmg enzymes active site 

Ubn.iuitin-coniuo,atina: enzymes [UbC oi I:.? enzymes) (1 ?. 3\ catalyse the co^alent attachment of ubiquitm to target 
35 proteins An aoiivat-Kiubiquilin moiety is iran^fetr-Ki from an ubiquitin-activating enzyme- tE1 1 to E2v\hieh later ligates 
ubtuuitm dnectK to substrata piotetns \Mth or without the assistance of 'N-end' recogncing proteins In most 
specie-, iheie ait many fonns of UBO (at least 0 in yeast 'i which are implicated iri dh'etse cellulai functions A cysteine 
residue is required for ubiquitm-thiolester formation. There is a single conserved cysteine in UBC's and the region 
around that tesidue isconserved in the sequence of kno^n UBC isozymes That region has been used as a signature 
40 pattern. 

Consensus pattern: [FYWLSP3-H-[PCj-[NHj-[LiV3-x{3.4i-G-x-lLIVj-C-[LiV3-x- [LlVj [C is the active site residue] 
[ 1] Jentsch S. ; Seufert W.« SommerT, Reins H.-A. Trends Biochem Sci. 1 5: 1 95-1 98; 1 9901 [ 2) Jentsch S., Seufert 
W Hauler H -P Biochim Biophvs Acta 108P [ '^Hershko A 'fiends Biochem Sci 1fr 2fr8t Wf)l ) 

[1605] 7 0E> Utoporphvnnogen decaitx ^vlase siqnatuiei. 

45 Uropotphyrinugun dtcaiboAVlass; (URO-D't the filth ercynm of th> j h> j rne biosynthutic p iifiv\ iy c it i!\j:> j s the sequential 
decarboxylation of the four acetyl side chains of uroporphyrinogen to yield coproporphynnogen [1J.URO-D deficiency 
is responsible for the Human genetic diseases faimliaiporphyria cutanea tarda t'tPCT) and hepatoerythropoietic por- 
phyna iHEP'i The sequent e of UR043 has been svell conser^eo tnioughout ^ S 'nlution The b^st conserved region is 
located in the N-terminai section: it contains a perfectlyconserved hexapeptide There are two arginme residues in this 

so hexa|.eptidey,hi-li could be involved tnthe binding \ia salt budges totht .^ibovylgioupsofthf 1 pio^ionatesid* 1 chains 
otthe substrate This region has been used .ts a ii^n.jtuie fattetn A second siqnatuie p.jttem is b^sed on a ^trxther 
^eii c onset ^ed region whith is lo:ated in the rential section of th^ r. totem 
Consensus pattern: P-x-W-x-M-R-Q-A-G-R 

Consensus pattern: G-F-[STAGCVHSTAGC3-x-P-[FYW3-T-[LV3-x{2)-Y-x(2)-[AE3-[GK] 
ss [1] GaiiiV J P L ibb^-Bois R ChdstovssK i A R^tka t Hainson L Kushner J Lahbf- P Eut J Biochem 205 
1011-1016(1992). 

[1606J ttbtF COOS ineth^ltransferase tamily stynatui^t. 

The following mt;th>lt! : insfetases have been shown [1]to i,hare teqions of similarilies - Escherictiia cah ubiE which 
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ts invoked in both ubiquinone and menaqumone otosyntnests and which catalyzes the S-adenosylmethionme depend- 
ent methvlatton of 2-polvpi<;nyl-o-iTtetho^-1 4-ben;x> uinol into 2-polypten>l-3- methyls -methovy-1,4-L<;n::oquindl 
and cf demethylmeriaquinol into menaquinol - \e : isf uOOS a ubiquinone biosynthesis miMhIylransfepibe - Baulius 
subtilts spote germination protein C2 (gene gercBor getC2i a pmbafJemenaqutnone biosynthesis methlytransferase 

s - Lactococcus lactis gerC2 nomoiog - Caenorhabditis eiegans hypothetical protein ZK652 a - Leishmama donovant 
amrtsticsote- specific piotein A41 These aie hvdrophilk ptoteins of .jbout 30 Kd t e*cept foi i'.hb'jV. it which it 05Ko> 
They ran !>=• picket up tn the database by thu tollovung patterns 
Consensus pattern: Y-D~x-M-N-xi2HUVMj-S-xi3>-H-x(2VW 
Consensus pattern R-V [I IVMj-K \P\ j-C G-\-|l 1VMF j-x t 2i-|l IVMj-t- 

10 { 1] Lee P.T.. Hsu A.Y.. Ha H.T.. Clarke C.F. J. Bacterial. 179:1748-1 754(1 9971 
[1607] 707. Uncase signature 

Uncase (urate oxidase) [1 j is the peroxisomal enzyme responsible for the degradation ot urate into allantoin. Some 
ipeues like primates and biros have lost the gene foi uric ase ana ^re thetefoie unable to degr.jdeui.tte Uncase is 
a protein of 300 to 400 ammo ..uiids A highly conset^ed region lo:ated in trV tential part of the sequence has be^n 
*s used as a signature pattern. 

Consensus pattern: [LV]-x-[LVHL!V3-K-[STVj-[ST3-x-[SN]-x-F-x(2HFY]-x{4)- [FY]-x(2)-L-x(5)-R 
j 1 1 Motoiima K Kan aw S Goto 1 ? J Biol Chem 163 !6o' '■ HC81i1t»cs8) 
[1608] "03 Universal stiess piotein family tUsp) 

[1609] B> a wide lange of stiess conditions members ot the Usp family are predated to be i elated to the MaDS bov 
so proteins tkiribcn(.t_faol and bind to DNA [z] Number of members 39 

[1 j Expression and role of the unixersal stress protein UspA of Escherichia oli dunng yto\4h ariest NystiomT 
Neidhardt FC: Mol Microbiol 1994: 11:537-544. 

[2] Sequence analysis of >ntk iryolt^ developmental proteins moie nt and novel domains Mushegi m AR b uonin 
25 EV: Genetics 1998; 144:817-828. 

[1610] ~~0P Ubiquitin domain signatuie and profile 

Ubiquitin [1 2 3] is a protein ol s> j v.mh su imino acic residues founc in ail eul-atyotio cells and v\hose sequence is 
extremely well conserveo fiom piotozoan to \ertebrates it ola\s a M role in a variety of cellular processes such as 

30 AfP-dependent selective degradation of cellular proteins. maintenance of chromatin structure, regulation of gene ex- 
pression stiess response ano noosome orogenesis In most species thete ate many genes :odmg for ubiquitin How- 
ever they can be classified into two classes. The first class produces polyubiqurttn molecules consisting of exact head 
to tail repeats ot ubiquitin. The number of repeats is variable (up to twelve in a Aenopus gene). In the majority ot 
poiyubiquitin pwcursois there is a fin.il ammo-.tcid after the last tepeat !he second cla^s of genes produces pieunsot 

35 proteins consisting ol a sing!- 1 oopv of ubiquitin fused to i C-terniin i! evtension prote in tCEP't Th> j re are two types of 
C£ P proteins and both seem to be liposomal proteins Ubiquitin is a globular piotein the last four C -terminal lesidues 
(Leu-Atg-Gly-Glyt extending from the compact sttucture to form a 'fail' important fa its function The kilter is mediated 
by tne covalent conjugation of ubiquitin to tarq-^t prot>ms i: s' ^n isopeptide linkaq-* b<-to e<-n the C-tetmtna! glycine 3nd 
the epstion amino group of lysine residues mthetaryet proteins There are a number of proteins which ate evolutionary 

40 related to ubiquitin: - Ubiqurtm-like proteins from bacufoviruses as well as in some strains of bovine viral diarrhea viruses 
itivDv't ihese proteins ate highly similai to their eukaryoiic counterparts - Mammalian protein GD V [4j GL>> is com- 
posed of two domains, a N-termmal ubiquitm-like domain of 74 residues and a C-termma! domain of 83 residues with 
some similarity with thethyroglobulin hormonogenic site, - Mammalian protein FAU (5j. FAU is a fusion protein which 
c<. nsist of a U terminal uhiquitimlike protein of 74 residues fused to ntoiomal piotein S3t.i ■ Mouse piotein NFDD-8 

45 [B] a ubiquitin-itkt; protein ol f*1 r<= sidims - Human protein BAT: a iarq^ fusion protein of 1132 residues that contains 
a iM-terminal ubiquitin-like domain. - Caenorhabditis eiegans protein ub!-1 [7]. Ubl-1 is a fusion protein which consist 
of a N-teiminal ubiquitin 4ike piotein of y <" residues fused to nbosomal pmtein 82 7 A /east DMA iepati piotein PaD 13 
[8] PAD2? contains a N-tetminal domain that seems to be distantly yet significantly related to ubiquitin - Mamrrulian 
RAD23-related proteins RAD23A ano RAD23B - Mammalian BCL-2 binding athanogene-1 tBAG-1 1 BAG-1 is a protein 

so of 274 residues that contains a central ubiquittn-like domain. - Human spliceosome associated protein 114 (SaP 114 
orSf : 3M20t ■ Yeast ptotein D&KI a patein involved in spindle pole boo\ ouflication ano'Aiitdi contains a hi-termin.ji 
tibiquitm-like domain - Human piotein CKAP1/TFCB Schiz:osac:haromv'tes pombe protein alp 11 and Caenorhabditis 
eiegans hypothetical protein F53F4.3. These proteins contain a N -terminal ubiquitin domain and a C-termma! CAP- 
fjly domain - b>'fn^osaccharomyci : 'S pombe hy( ofbetical ( return SpAC2^1 16 Thts protein contains a N^erminal 

ss ubtquttin domain. - Yeast protein SMT3. - Human ubiquttm-like proteins SMT3A and SMT3B. - Human ubiquitm-iike 
protein SMT3C sals? I nov\n as P1C1 Ubl i Sumo-i Gmp-1 or Sentrmj This protein ts involved tn targeting ranGAPI 
k the nuiear poie xmplex patHn ianPP2 - ^MTS-tik^ prott-tnt- in plants and CaenorhaL drtii <;leyan* To tdentifv 
ubiquitin and related prot-iins a p^Ut-rti has been developed bas> : 'd on conserved postticns in the wtrtr^l s-ictiun of 
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the sequence A profile was also developed that spans the complete length of the ubiquitin domain 
Consensus pattern: K-x(2HLIVM]-x-[OESAK^x(3HLIVMHPA^x{3>-Q-x-[LIVM3-[LIVMCHLiVMFY]-x^■-x{4HDE3 
[ 1] Jentseh S , Seufert W , Hauser H -P Biochim Biophys Acta 1089-127-1 39. 1991 t [ 2] Monia B P , Ecke-r D J Croke 
S.T Bio/Technology 8 209-215(1990) [ 3] Finley D Vaishavsky A Trends Biochem Set. 10 343-34A 1965} [ 4] Fihppi 
M.. Tribiali C, Tomato D. Genomics 7 453-457(1990) [ 5] Olvera J.« Wool i.G. J. Biof. Chem. 268:17987-17974(19931 
(6j Kumar S , Yoshida Y, Nc-da M Biochem. Biophys Res Commun 195 393-399(1993) j '.'j Jones D . Candido E. 
P. J. Bioi. Chem. 268:1954S-19551(1993).[8j Melniek L, Sherman F. J. Moi. Biol. 233:372-388(1 993). 
[1611] 710. VHS domain 

[1612] Domain present in VPS-:?/', Hrs and STAM Number of members 2 V 
[1613] 711 . Vmeuim family signatures 

Vmouim [1] is a eukaryotic protein that seems to be m\o!ved in the attjchment of the actin-based microfilaments to the 
plasma membrane Vinculims located at the cytoplasmic side of focal contacts or adhesion plaques In addition to actm, 
vinculin interacts with other structural proteins such as talin and alpha-actinins. Vinculin is a large protein of 116 Kd 
i about a 1000 lestduesj Structurally the protein consists of an acidic. N-tetminal domain of about 90 Kd separated 
from a basic C-termmal domain of about 25 Kd by a proline-rich region of about 50 residues. The central part of the 
N-terminai domain consists of avanable number (3 in vertebrates, 2 in Caenorhabditis elegans) of repeats of a 110 
amino acids domain Catenms [2] are proteins that .associate with the cytoplasmic domain of iwanety of cadhenns 
The association of catentns to cadhenns pioduces a complex which is linked to the actm filament network, and which 
seems to be of primary importance for cadhenns cell-adhesion properties Three different types of catemn? seem to 
exist alpha, bela, and gamma Alpha-cafenms are proleins of about 100 Kd which are evolulionary related to vinculin 
Interm of their structure the most significant differences are the absence, inalpha-catenin of the repeated domain and 
of the prolme-rich segment Two signature patterns for this family of proteins have been devolped The fiist pattern is 
located in the N -terming I section of both vmeuim and alpha-catemns and is part, in vinculin, of a domain that seems to 
be m\o!ved with the interaction with talin The second pattern is based on a conserved regionin the N-terminai part of 
the repeated domain of vinculin. 

Consensus pattern: [KR]-x~[LiVMF3-x(3Hl-iV!vlA]-x(2Ht-!VM]-x{6)-R-Q-Q~E-L Consensus pattern: [L!VM3-x-[QA3-A-x 
{2)-W-[IL]-x-[DN3-P 

[ 1 ] Otto J J Cell Mott! Cytoskeleton 16 1-6(1990) [ 2]Herrenkneoht K , Ozawj M EckerskornC , Lotispetch F , Lenter 
M HemlerR Proc Nat! Acad Set USA 83 9156-9160(1 991) 
[1614] 712. (Vitellogenin i\l) Lipoprotein amino terminal region 

[1615] This family contains regions from Vitellogenin Microsomal triglyceride transfer ptotein and apolipoptotein B- 
100. These proteins are ail involved m lipid transport JIJ. This family contains the LV1n chain from lipovrtellm, that 
contains two structural domains. Number of members: 33 

[1616] (1 [ The struotutal basis of lipid interactions in lipovitellm, a soluble lipoprotein Anderson '!A L.esift DG, Ba- 
naszak L.J Structure 1998:6:895-909. 

[1617] 7 1 3. fVMSAi Major surface antigen from hepadnavims 
[161S] 714 ssDNA binding protein (Viral DMA bp) 
This protein is found in herpesviruses and is needed fot replication 
[1619] 715 (Votage CLC} Voltage gated chloride channels 

[1620] This family of ion channels contains 10 or 12 transmembrane helices Each piotem forms a single pons It 
has been shown that some members of this family form homodimers These proteins contain two CBS domains 

[1j Schmidt-Rose T. Jentseh TJ' J Biol Chem 19S7,272.20515-20&::i 

[2j Zhang J. George A!.. Jr. Griggs PC, Fouad GT. Roberts J, Kwiecinski H. Connolly AM Ptacek U Neurology 
1996,47:993-998. 

[1621] ;'16 von Willebrand factor type <\ domain tvwat 

More von Willebrand factor type A domains"' Sequence similarities with malaria thrombospondin-related anonymous 

protein drhydropyndine-sensitive calcium channel and mter-aipha-trypsin inhibitor 

Bork P. Rohde K: 

Biochem J 1991,279.908-911. 

1. RUGGER I, Z.M. and WARE, J. 
von Willebrand factor. 

FASEB J. 7 308-316(1993), 

2. COLO MB ATT I, A, BONALDO, P. and DQLIANA, R. 

Type A modules 1 interacting domains found in several non-fibrillar collagens and in other extracellular matrix pro- 
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teins. 

MATRIX 13 297-306(1993). 

3 PERKINS, 3 J : SMITH, K.F . WILLIAMS. S C, HARIS. P I.. CHAPMAN, D. and SIM, R B 
s The secondary structure of the- von Willebrand factor type A domain in factor B of human complement by Fourier 

transform infrared spectroscopy. 

Its occurrence in collagen types VI, VII, XI! and XIV the i rite grins and other proteins by averaged structure pre- 
dictions. 

J.MOL.BIOL. 238 104-119 (1994). 

10 

4, BORK, P. and ROHDE, K. 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospond in-related anony- 
mous protein, dihydropyndme-sensitive calcium channel and mter-alpha-trypsin inhibitor, 
BIOCHEM J. 279 908-910 (1991). 

i5 

5 EDWARDS, YJ K. and PERKINS, S J 

The protein fold of the von Willebrand factor type A domain is predicted to he similar to the open twisted bets- 
sheet fianked by alpha-helices found in human ras-p2i 
FE8S LETT. 358 283-286 (1995) 

6. LEE, J.O., RIEU, P., ARNAOUT. M.A. and LIDDINGTGN, R. 

Crystal structure of the A domain from the alpha subunit of integrin CR3 iCD11b/CD18) 
CELL 80 631-638(1995). 

25 7. QU, A. and LEAHY, D.J. 

Crystal structure of the I- domain from the C011a/CD13 (t.l' : A-t, alpha !.. beta 2) integrin 
P ROC . NATL. AC AD SCI. USA 92 10277-10281 (1995). 

[1622] The von Willebrand factor is a large muitimeric glycoprotein found m blood plasma. Mutant forms are involved 
30 in the aetiology o! bleeding disorders in von Willebrand factor, the type A domain (vWF) is the prototype for a protein 
superfamiiy. The vWF domain is found in various plasma proteins complement factors B. C2. CPS and CP4. the 
integrins (l-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins (2-4) Proteins that incorporate 
vWF domains participate in numerous biological events (e.g.. cell adhesion, migration, homing, pattern formation, and 
signal transduction), involving interaction with a large array of Itgands [2], Secondary structure prediction from 75 
35 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands [3j. Fold rec- 
ognition algorithms were used to score sequence compatibility with a library of known structures: the vWF domain fold 
was predicted to be a doubly-wound, open, twisted beta-sheet flanked by alpha-helices [5] 3D structures have been 
determined for the l-domains of integrins CD"! 1b (with bound magnesium) [6] and CD11a {with bound manganese) [7], 
The domain adopts a classic aipha^beta Rossmann fold and contains an unusual metal ion coordination site at its 
40 surface It has been suggested that this site represents a general metal ion-dependent adhesion site (MIDAS) for 
binding protein ligands [6j. The residues constituting the MIDAS motif in the CD11b and CD11a l-domains are com- 
pletely conserved, but the manner in which the metal ion is coordinated differs slightly [7). 

[1623] V'Wf-'ADOMAIN is a 3 -element fingerprint that provides a signaruie for the %A/vP domain superfamiiy The 
fingerprint was denved from an initial alignment of 14 sequences Motif 1 includes the first beta-sttand and 3 conserved 

■*s residues in\o!\ed in metal ion coordination m !-domains (Asp and 2 serines m positions 8 10 and 12, respectively); 
motif 2 spans strands beta-2 and beta-?' and motif 3 encodes beta-strand 3 and a conserved Asp un position 7), which 
coordinates the metal ion [6.7], Three iterations on OWL27.0 were required to reach convergence, at which point a 
true set comprising 56 sequences was identified Numerous partial matches were also found 
[1624] 7 1 7 (WD40; WD domain. G-beta repeat 

50 The ancient legulatoiy-protein famil> of WD-repeat proteins 
Neer EJ, Schmidt CJ, Nambudnpad R. Smith TF. 
Nature 1994;371:297-300. 

Beta-transducm (G-beta) is one of the three subunits (alpha, beta, and gamma) of the guanine nueleotide-binding 
proteins (G proteins) which act as intermediaries in the transduction of signals generated by tiansmembrane receptors 
ss [1] The alpha subunit binds to and hvdrol^es GTP the functions of the beta and gamma subunits are less clear but 
they seem to be required for the replacement of GDP by GTP as well as for membrane anchoring and receptor rec- 
ognition. 

[1625] In higher eukaryotes G-beta exisls as a small multigene family of highly conserved proteins of about 340 
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amino acid residues. Structurally G-beta consists of eight tandem repeats of about 40 residues, each containing a 
central Trp-Asp motif (this type of repeat is sometimes called a Vv'D-40 repeat). Such a repetitive segment has been 
shown [E 1,2,3,4,5] to exist in a number of other proteins listed below: 

s . Yeast STE4. a component of the pheromone response pathway STE4 is a G-beta like protein that associates with 
GPA.1 (G-aipha) and STE18 (G-gamma), 

Yeast MS11 , a negative regulator of RAS-mediated cAMP synthesis. MSI1 is most probably also a G-beta protein. 
Human and chicken protein 12.3. The function of this protein is not known, but on the basis of its similarity to G- 
beta proteins, it may also function in signal transduction. 
to - Chlamydomonas reinhardtii gblp. This protein is most probably the homolog of vertebrate protein 12.3. 
Human L1S1, a neuronal protein involved in type-1 lissencephaly [E2]. 

Mammalian coatomer beta' subunit (beta'-COP), a component of a cytosolic protein complex that reversibly as- 
sociates wrth Golgi membranes to form vesicles that mediate biosynthetic protein transport 

*s - Yeast CDC4, essential for initiation of DNA replication and separation of the spindie pole bodies to form the poles 
of the mitotic spindie. 

Yeast CDC20, a protein required for two microtuhule-dependent processes: nuclear movements prior to anaphase 
and chromosome separation. 

Yeast MAK11. essential for cell growth and for the replication of Mi double-stranded REslA. 
so - Yeast PRP4, a component of the U4/U6 small nuclear nbonucleoprotein with a probable role in tnRNA splicing. 
Yeast PWP1 . a protein of unknown function. 

Yeast SKIS, a protein essential for controlling the propagation of double-stranded RNA. 

Yeast SOF t, a protein required for ribosomai RNA processing which associates with U3 small nucleolar RNA. 

- Yeast TUP1 (also known as AER2 or SPL2 or CYC 9), a protein which has been implicated in dTMP uptake, cat- 
25 abolite repression, mating sterility, and many other phenotypes. 

Yeast YCR57c, an ORl' : of unknown function from chromosome ill. 
Yeast YCR72c, an ORF of unknown function from chromosome 111. 

Slime mold coronm, an actin-binding protein. 
30 - Slime mold AAC3, a developmental^ regulated protein of unknown function. 

Drosophila protein Groucho (formerly known as E(spi): 'enhancer of split'), a protein involved in neurogenesis and 
that seems to interact with the Notch and Deita proteins. 
Drosophila TARIi-80. a protein that is tightly associated with TRIE). 

35 

[1626] The number of repeats in the above proteins varies between 5 (PRP4, TUP1, and Groucho) and 8 (G-beta. 
STE4. MS11. AAC3, CDC4. PWP't. etc ). In G-beta and G-beta like proteins, the repeats span the entire length of the 
sequence, while in other proteins, they make up the N-termin3l. the central or the C-terminal section 
[1627] A signature pattern can be developed from the central core of the domain (positions 9 to 23) 

- Consensus pattern: [ L ! V M STAC j- [L I V M P YWSTAGC]-[L!MS TAG]-[L!VMSTAGC}-> (2}-[DN]- 
x(2HL!VMWSTAC]~x-[L!VMFSTAG3-W-(DENHLiVMFSTAGCN] 

45 [ljGi!manA.G. 

Annu. Rev. Biochem. 56:615-649(1987). 

[2] Duronio RJ,, Gordon J.I . Boguski M.S. 

Proteins 13:41-56(19921 

[ 2) van der \ corn L Ploegh H L 
50 FEBS Lett. 307:131 134(19921 

[4]NeerE-J SihinidtCJ N.-imhudnf ad R Smith T.F 

Future 371 29"*- 300(1 994 1 

[ 53 Smith IF.. Gaiatees C.G., Saxena K.. Neer E.J. 

Biochemistry In Piess(1t98) 

ss 

[1628] 71b VvHEP-TRS domain containing proteins 

A conserved domain ot 46 amino acids has been shown [1] to exist in a number of higher eukaryote aminoacyl-transfer 
RNa synthetases This domain is pitstnt one to six times in the following enzymes: 



248 



EP 1 033 405 A2 



Mammalian multifunctional arninoacyl-tRNA synthetase. The domain is present three times in a region that sepa- 
rates the N-terminai cjiutamyi-tRNA synthetase domain from the C -terminal prolyl-tRNA synthetase domain 
Drosophiia multifunctional amtnoacyl-tRNA synthetase. The domain is present sixtimes in the mtercatalytic region 
Mammalian tryptophanyl-tRNA synthetase. The domain is found at the N-termina! extremity, 
s - Mammalian, insect, nematode and plant glycyl-tRNA synthetase. The domain is found at the N-terminai extremity 
[23- 

Mammalian histidyl-tRNA synthetase. The domain is found at the N-terminai extremity. 

[1629] This domain, which is called WHEPTRS. could contain a central alpha-helical region and may play a role in 
to the association of tRNA-synthetases into multienzyme oomple>es 

[1630] A signature pattern based on the first 29 positions of the WHEP-Domain has been developed. 

- Consensus pattern: [QYH3-pNEA3-x^LIVHKR^x(2)-K-x(2HKRNGHAS}-x(4Hl-lVHDENK}-x(2HlV}-x{2)-l.-x 

(3)-K 

15 

[ 1) Cerini C, Kenan P.. AstierM., Gratecos 0., Miranda M. f Semeriva M. EMBO J. 10: 4267-4277(1991). 
[ 2] Nada S , Chang P.K., Dignam J.D. 
J. Biol Chem 268.7660-7667(1&93). 

20 [1631] 719. (Worm family 8) Putative membrane protein 

Analysis of protein domain families in Caenorhabditis elegans. 
Sonnhammer EL Durbm R; 
Genomics 1997;46:200-216. 

This family called family 8 in [1]. may be a transmembrane protein 
25 The specific function of this protein is unknown. 
[1632] 720 Xylose isornerase 

Xylose isornerase (EC 5.3.1.5) [1] is an enzyme found in microorganisms which catalyses Hie intercon version of D- 
xylose to D-xyluiose. It can also isomerize D-nboseto D-ribulose and D-glucose to D-fructose <ylose isornerase seems 
to require magnesium for its activity, while cobalt is necessary to stabilize the tetramenc structure of the enzyme A 

30 number of residues are conserved in all known xylose isornerases 

[1633] Xylose isornerase also exists in plants |2] where it is homodimenc. and is manganese-dependent 
[1634] Two signatures patterns for xylose isornerase have been developed The first one is derived from a stretch 
of five conserved amino acids that includes a glutamic acid residue known to be one of the four residues involved in 
the binding ofthe magnesium ion [3]; this pattern also includes a lysine residue which is involved in the catalytic activity 

35 The second pattern is derived from a conserved region in the N-terminal section ofthe enzyme that include an histidme 
residue which has been shown [4] to be involved in the catalytic mechanism ofthe enzyme 
Consensus pattern: [L!]-E-P-K-P-x(2)-P 
[E is a magnesium hgand] 
[K is an active site residue] 

- Consensus pattern: [FL]-H-D-x-D-(L!V]-x-[PD]-x-[GDE3 

[H is an active site residue] 

45 [ij Oauter 2. , Dauter M. , Hemker J. . Wfeei H , Wilson K.S. 

FEBS Lett 247' 1-8(1 989) 

[2] Kristo P.A.. Saarelainen R., Fagerstrom R.. Aho S.. Korhola M 
Eur J Biochem 237:240-246(1996) 
[ 3j Henrick K., Coiiyer C.A., Blow D.M. 
50 j. Mol. Biol. 208:129-157(1989). 

[ 4] Vangrysperre W . Ampe C . Kersters-Hilderson H . 'Tempst P 
Biochem J 253.195-199(1969} 

[1635] 721 XPG protein signatures. Xeroderma pigmentosum (XP) ]T| is a human autosomal recessive disease, 
55 characterized by a high incidence of sunlight-induced skin cancer People's skin cells with this condition are hypersen- 
sitive to ultraviolet light, due to defects in the incision step of DMA excision repair There are a minimum of seven 
genetic complementation groups involved in this pathway: XP-A to XP-G. The defect in XP-G can be corrected by a 
133 Kd nuclear protein called xPG for XPGCi [2] XPG belongs to a family of proteins [2.3.4.5.6] that are composed 
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of twomain subsets: - Subset 1, to which belongs XPG ; RAD2 from budding yeast and rad!3 from fission yeast. RAD2 
and Y PG are smyle-stianded DN6 endonucleases [7.S] XPG makes the 3'incision in human DNA nucleotide excision 
repair [9] - Sutset 2 k whn.h belongs mouse- and human FEN-1. rad2 from fission yeast, and RAD27 from budding 
yeast FEN- 1 iss a structure-spec [fK endonuclease in addition to the proteins listed tn the above groups, this family 

s also includes Fission yeast evol a 3'- --3' double-stranded DNA. exonuclease that could act in a pathway that corrects 
mismatched h.jse fairs, ■ Ye3s,t t:\C1 <DHSl), a protein with probably the same function as e:<oi. ■ Yeast DIN? Se- 
quence alignment of this family of proteins reveals thai similarities are largely confined to two regions. The; first is 
located at the N-termmal extremity iN-region) and corresponds to the first 95 to 105 amino acids. The second region 
is internal tl-regjon) and found towards the C-terminus: it spans about 140 residues and contains a highly conserved 

to cute of z7 amino acids that includes a conserved penta peptide- (E-A-[DE]-A-[QS]i. It is possible that the conserved 
acidic r> j sidue-s am irkolvud in th> j catalytic mechanism of DNA excision repair tn XPG The amino acids linking the: N- 
and i-rtpjions are not v.onsfr\ed indeed, they are largely absent from proteins belonging to the second subset Two 
sijn.jture patterns have been developed for these proteins. The first corresponds to the central part of the N -region, 
the s,econo to fart of tne l-tegion and includes the putative catalytic core penta peptide 

is [1636J Consensus pattern: [Vl]-{KREFP-x-{FYILFV-F-D-G-xt2)-(PIL)-x-{LVCj-K- 

Consensus pattern: [GSHLIVM]-[PER]-[FYSHLIVM}-x-A-P-x-E-A-[DE3-[PAS]- [QS]-[CLMj- 

[1637] j ! | "fanaki h Wood R D Trends Bicchem Sci 19 83--S6, 1994) [ 2] Soherly D., Nouspikei T . Coriet J . Ucla 
C, Bairoch A., Ciarkson S.G. Nature 363: 182-1 85(1 993). [ 3] Carr A.M., Sheidrick K.S., Murray J.M.. Al-Harithy R„ 
Watts F.Z.. Lehrnann A.R. Nucleic Acrds Res 21:1345-1349(1993).[ 4] Murray J.rVl, Tavassoli M., Al-Hanthy R , She!- 

20 dncK K S Lehmann A R Can A M Watts FZ Mol Cell Biol 14 4878-4888(1994) [ 5] Harrington J J . Lieber M R 
Genes Dev 8 1344-1355(1094} [ 6] S^anKasi P , Smith G R. Science 267 1 166-11691 1995) [ 7] Habraken Y . Sung P . 
Prakash L Ptakash S Nature 366 365-368(1 993 V{ 8] O'Donovan A . Scherly D , Ciarkson S.G.. Wood R D J. Biol 
Chem 2G9 !5Eo5-15*«3&( 1^94 > [ 0] O'Donovan A . Davies A A Moggs J G . West & C , Wood R D. Nature 371 
432-435(19941 

25 [1638] "'ZZ XgnthinSfUracil permeases family 

The following transport proteins which are involved in the uptake of xanthine or uracil are evolutionary related [1]: 

Urn. urn. aoid-xanthmu pe unease (ge ne uapAt from Aspergillus nidulans 

Punne permease (gene uapG) from Aspergillus nidulans. 
30 . Xanthine permease; from Bacillus subtrlis iqerte pbuXT. 

Uracil permease trom Esrnenchia coll igene utaA) [2] and Bucillus (gen^ c yiF) 

Hypothetical protein ycdG from Escherichia coir. 

Hypothetical protein \nfO ttom Escherichia coli 

Hypothetical protein ygfU from Escherichia coli. 
35 - Hypothetical protein ytcE from Escherichia coli. 

Hypothetical protein yunJ from Bacillus subtihs 

Hypothetical protein yunK trom Bacillus subtrlis. 

[1639] The\ aie proteins ot from 430 to 595 residues that seem to contain transmembrane domains 
40 The best v.onsHi\ed rtpjion which conebpondb v\ith Mhat be<=ms k b<= the tenth tr=rnbm<;mbian<; domain h=rs be<=n 
selected as a siqnature pattern. 

- Consensus pattern: [LIVM]-P-x-[PAS!F3"V-[LiVM3-G-G-x(4HLiViV!HFYHGSA)~X"[LtVM]-x(3VG 

45 [ 1] Diailtnas G Gurfinkie! L Arst G Ce^cheitoG Sca^ud. hio C .I Biol Chem Z"0 8«10-b622(1o35) 

[ 2 j Andersen P S Frees D Fabt R W\gind B J Eactenci 177 2006-201^1^5} 

[1640] 723 Hypothetical yabO/yceC/sfhB family 

Tne following proterns v.hrch seems to helong to a family of pseudoundtne synthases (EC 4 2 1 70s [1] ha\e been 
so shown to share regions of similarities: 

Escheuchia coli and Haemophilus influenza^ nbosomai large subumt pseudoundme synthase A igene iluAl it is 
responsible for synthesis of pseudoundine from uracil-746 IN 23S rRNA 

Escherichia coli and Haemophilus influenzae rtbosomal large subuntt pseudoundme synthase C (gene riuC). It is 
55 responsible for synthesis of pse udounoine from uracil at positions °5S 2S04 ana 2580 in 23b rRNA 

Escherichia colt piotem and homologs in other bactena large subumt pseudoundme b\ntnase D (gene iluDj 

- teast HR6P d*amina^ <u.en<r RIFVi 

EsiJienehia coli hypothetical protein vqcB and H11435 the cor respond) nq Haemophilus influen^a^ ( rctein 
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Haemophilus influenzae hypothetical protein HI0042. 

Aquifex aeolicus hypothetical protein AQ...1758 

Bacillus subtilis hypothetical protein yheT. 

Bacillus subtilis hypothetical protein yjbO. 
s - Bacillus subtilis hypothetical protein yiyB. 

Helicobacter pylori hypothetical protein HPQ347 

Helicobacter pylon hypothetical protein HP0745 

Helicobacter pylori hypothetical protein HP0958. 

Mycoplasma genitalium hypothetical protein MG209. 
to - Mycoplasma genitalium hypothetical protein MG370. 

- Synechocystis strain PCC 6803 hypothetical protein sit 1592. 
Synechocystis sttain PCC 6803 hypothetical protein sir 1629 
Yeast hypothetical protein YDL038c 

Yeast hypothetical protein YGR169c. 
»5 - Fission yeast hypothetical protein SpAC18B1 1.02c. 

Caenorhabditis elegans hypothetical protein K07E8.7. 

[1641] These are proteins of from 21 to 50 Kd which contain a number of conserved regions in their central section. 
They can be picked up in the database by the following highly conserved pattern. 

- Consensus pattern: (LIVCAHNHYT]-R^LI]-D-x{2)-T4STA]-G-[L!VAGCHL!VMFK2HLIVMFGCHSGTACV] 

[1642] [ t] Conrad J., Sun D . Englund N . Ofengand J. J. Biol. Chem. 273 18562-18566(1998). 
[1643] in addition, the following bacteria! proteins which seems lo belong to a family of pseudouridine synthases 
25 (EC A 2.1.70} p] also have been shown to share regions of similarities 

Escherichia coii and Haemophilus influenzae 18S pseudoundylate 516 synthase (EC 4.2.1.70) (gene: rsuA). This 

enzyme is responsible for the formation of pseudouridine from uracil-516 in 16S ribosoma! RNA. 

Escherichia coii hypothetical protein yciL and HI1199, the corresponding Haemophilus influenzae protein. 
30 . Escherichia coii hypothetical protein yjbC. 

Escherichia coii hypothetical protein ymfC and HI0G94. the corresponding Haemophilus influenzae protein. 

Aquifex aeolicus hypothetical protein AQ„554. 

Aquifex aeolicus hypothetical protein AQ_1464. 

Bacillus subtilis hypothetical protein ypuL. 
35 - Bacillus subtilis hypothetical protein ytzF. 

Borrelia burgdorferi hypothetical protein BR0129, 

Helicobacter pylori hypothetical protein HP 1459. 

Synechocystis strain PCC 6803 hypothetical protein slr0361. 

Synechocystis strain PCC S803 hypothetical protein slrQS12 

[1644] These are proteins of from 25 to 40 Kd which contain a number o! conserved regions in their centtal section. 
They can be pic Ned up in the database by the following highly conserved pattern. 

- Consensus pattern: G-R-L-D-x(2HSTAHx-G-[LIWA}-jl.lVMFK3)-[STHDNST] 

45 

[1645] [ 1] VVrzesinski J . Bakin A . Nurse K , Lane B G , Ofengand J Biochemistry 3489Q4-89 13(1 995} 
[1646] ?24 Zinc finger present in dystrophin, CBP/p300 
Z2 in dystrophin binds calmodulin 
Putative zinc finger; binding not yet shown, 
so [1647] 725. Zinc carboxypeptidase 

There are a number of different types of zinc-dependent carboxypeptidases {EC 3.4 17.-) [1,2], All these enzymes 
seem to be structurally and functionally related The enzymes that belong to this family are listed below 

Carboxypeptidase A1 (EC 3.4.17.1 1. a pancreatic digestive enzyme that can removes all G-termmal amino acids 
ss with the exception of Arg, Lys and Pro. 

Carboxypeptidase A2 (EC 3 4 17.15). a pancreatic digestive enzyme with a specificity similar to that of carbox- 
ypeptidase Al. hut with a preference for bulkier C-termina! residues 

Carboxypeptidase B (EC 3.4. 17.2), also a pancreatic digestive enzyme, but that preferentially removes C-terminal 
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Arg and Lys. 

Carboxypeptidase N (EC 3.4. 17.3) (also known as arginine carboxypeptidase). a plasma enzyme which protects 
the body from potent vasoactive and inflammatory peptides, containing C-terminsl Arcs or Lys (such as kirnns or 
anaphylatoxins) which are released into the circulation. 
s . Carboxypeptidase H (EC 3 4 17.10; (also known as enkephalin convertase or carboxypeptidase E). an enzyme- 
located in secretory granules of pancreatic islets, adrenal gland, pituitary and brain. This enzyme removes residual 
C-termina! Afg or Lys remaining after initial endoprotease cleavage during prohormone processing 
Carboxypeptidase M (EC 3.4.17 12). a membrane bound Arg and Lys specific enzyme 

10 it is ideally situated to act on peptide hormones at local tissue sites where it could control their activity be for & or after 
interaction with specific plasma membrane receptors 

Mast cell carboxypeptidase (EC 3.4.17 1). an enzyme with a specificity to carboxypeptidase A, but found in the 
secretory granules of mast cells. 
*s - Streptomyces griseus carboxypeptidase (Cpase SG) (EC 3.4.17.-) [3], which combines the specificities of mam- 
malian carboxypeptidases A and B. 

Thermoaotmomyces vulgaris carboxypeptidase T (tv.C 3.4. 17.18) (CPT) j4j. which also combines the specificities 
of carboxypeptidases A and B. 

AE.BP1 [5], a transcriptional repressor active in preadipocytes. AE.BP1 seems to regulate transcription by cleavage 
so of other transcriptional proteins. 

- Yeast hypothetical protein ¥HR132c. 

[1648] All of these enzymes bind an atom of zinc. Three conserved residues are implicated in the binding of the zinc 
atom: two histidines and a glutamic acid Two signature patterns which contain these three zinc-ligands have been 
25 derived. 

- Consensus pattern: [PK3-x-[LIVMFY]-x<LiVMFY3-x{4)-H-(STAG]-x-E-x-[LIVM]-[STAG]-x(6}-[LIVrVIFYTA3 (H and E 
are zinc hgands3 

- Consensus pattern: HH:STAG]-x(3HLIVME]-x{2HLiVMFYW3-P-[FYWj [H is a zinc iigandj 

[ 1] Tan F., Chan S.J : Steiner D.F., Schilling J.W.. Skidgel R.A. 
J. BioL Chem. 264:13165-13170(1989). 

[2j Reynolds D.S Stevens R.L., Gurley [).S Lane WS., Austen K.F., 
Serafrn W.E 

35 J. Biol. Chem. 264:20094-20099(1989). 

[ 3] Narahashi Y. 
J. Biochem. 107:879-886(1990). 

[ 4] Teplyakov A . PolyaKov K., Obmolova G . Strokopytov B., Kuranova !., 
Osterman A.L, Grishin N.V.. Smulevitch S.V., Zagnitko O.R, 
40 Galperina O.V.. Mate M.V.. Stepanov V.M. 

Eur. J. Biochem. 208:281-288(1992). 
[ 5] He G.-R, Muise A., Li A.W., Ro H.-S. 
Nature 378:92-96(1995), 

[ 8] Hourdou M.-L, Guinand M., V'acheron M.J., Michel G . Denoroy L., 
45 Duez C M.. Englebert S . Jons B.. Weber G.. Ghuysen J -M 

Biochem. J. 292:563-570(1993). 
[ 7] Rawiings W.D., Barrett A.J 
Meth. Enzymol. 248:183-228(1995). 

so [1649] 726. Zinc finger, C2H2 type 

The C2H2 zinc finger is the classical zinc finger domain 

The two conserved cysteines and histidines co-ordinate a zinc ion. The following pattern describes the zinc finger. 
#-X-C-X( 1 -5 )-C-X3-^XS-#-X2~H-X(3~6HH/q 

Where X can be any amino acid, and numbers in brackets indicate the number of residues. The positions marked # 
55 are those thai are important for the stable fold of the zinc finger The final position can be either his or cys. 

The C2H2zinc finger Is composed of two short beta strands followed by an alpha helix The amino terminal part of the 
helix binds the major groove in DNA binding zinc fingers. 

[1650] 'Zinc finger' domains [1-53 ari ' nucleic acid-binding protein structures first identified m the Xenopus transerip- 
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tiort factor TFIHA. These domains have since been found in numerous nucleic acid-binding proteins. A zinc finger 
domain is composed of 25 to 30 amino-acid residues. There are two cysteine or histidine residues at both extremities 
of the domain, which are involved in the tetrahedral coordination of a zinc atom. It has been proposed that such a 
domain interacts with about five nucleotides. A schematic representation of a zinc finger domain is shown beiow; 



X X 



15 

X X 
X x 

so x * 

C H 

x \ / x 

25 x Zn x 
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[1651] Many classes of zinc fingers are characterized according to the number and positions of the histidine and 
cysteine residues insolsed in the zinc atom coordination. In the first class to be characterized, called C2H2, the first 
35 pair of zinc coordinating residues are cysteines while the second pair are histidmes. A number of experimental reports 
have demonstrated the zinc-dependent DMA or RNA binding property of some members of this class 
[1652] Some of the proteins Known to include C2H2-type zinc fingers are listed below The number of zinc finger 
legions found in each of these proteins are indicated between btac.kets, a '+' symbol indicates that only partial sequence 
data is available and that additional finger domains may be present. 

- Sacdwomyces cerevisiae- AGE 2 (3). ADR1 (2t, AZF1 (4) FZF1 (5) MIG1 (2). MSN2 (2). MSN4 (2). RGM 1 (2i, 
RIM1 (31 RIVSE1 (3), SFP1 {2), SSL1<1), STP1 (3), SW!5<3), VAC1 d)and 2MS1 (2). 

Einericeila nidulans: brIA (2), creA (2). 

- Drosophiia: AEF-1 (4), Cf2 (7). ci-D (5). Disconnected (2). Escargoi (5). Glass (5), Hunchback (8), Kruppel (5). 
■ts Kruppe!-H (4+i, Odd-skipped t4) ; Odd-paired (4), Pep (3t. Snail (5), Spait-msjor t7) ; Serependity locus beta (6). 

delta (7). h-1 (8), Suppressor of hairy wing su(Hw) (12), Suppressor of variegation suvar(3)7 (5), Teashirt (3j and 
Tramtrack<2). 

- Xenopus: transcription factor TFIliA (9), p43 from RNP particle (9), Xfin (37 !!), Xsna (5). gastrula XlcGF5.1 to 
XlcGF71.1 (from 4+ to 11+), Oocyte XlcOF2 to XicOF22 (from 7 to 12). 

so - Mammalian: basonuclin (6). BCL-6/LAZ-3 (6), erythroid krueppel-like transcription factor (3), transcription factors 
Spl (3), Sp2 (3), Sp3 (3) and Sp{4) 3, transcriptional repressor YY1 (4), Wilms' tumor protein (4), EGR1/Krox24 

(3) , EGR2/Krox20{3), EGR3/Piiot{3). EGR4/AT133{4), Evi-1 (10), GLI1 (5), GLI2 (4+), GLI3 (3+), HIV-EP1/2NF40 

(4) , HIV-EP2 (2), KR1 ($♦), KR2 (0), KR3 (15+), KR4 (14+), KR5 {11+), HF.12 (<3+), REX-1 (4), 2fX (13), 2ft' (13), 
Zfp-35{18), ZNF7J15), ZNF8(7), ZNF35(10), ZNF42/MZF-1 (13), ZNF43{22), ZNF46/Kup{2), 2NF76(7), ZNF91 

ss {36), 2NF133 (3). 

[1653] In addition to the conserved zinc ligand residues it has been shown [6] that a number of other positions are 
also important for the structural integrity of the C2H2 zinc fingers The best conserved position is found four residues 
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aftei the second cysteine it ib generally an atomatic or aliphatic residue 

- f o is^nsus- pattern C-x(2 4VC-v(>[LI\ MF> W }-v(8)-H-\(3 5}-H [The teor's and two H's- are zin,- I gands] 

s [ 1 j Klug A.. Rhodes D. 

Trends Giochem. Sci. 12:464--469{ I^B/A 

[ 2] Evans R.M.. Ho! ten berg S.M. 

Cell 52:1-3(1988). 

[ 3] Payre F., Vincent A. 
10 FEBS Lett. 234:245-250(19881 

[ 4] Miller J., Mclachlan A.D., Klug A. 

EM BO J. 4:1609-1614(1985). 

[5] Berg J. M. 

Proc Nat! Arao Sci US^ 85 09-1 02i 
*5 [ 6j Rosenfeld R., Margalit H. 

J. Biomol. Struct. Dyn. 11:557-570(1993). 

[1654] "17 Zmcfmget C3HC4 type t RING tingen 

Anumhei oteukatvotio and viral protein*, contain a conserved cysteine- nch domain ot 40 to 60 testdues (called OHC4 
so zinc-fmgsr or 'RING'tinqer) [1] that binds two atoms of zinc, and is probably involved m mediating protein-protein in- 
teractions Thn ?D sttuctme of the zinc ligutum svstem is unique to the RING oorruin and is tefeted to as the ' cioss- 
btace" motif The spacing of the cysteines in such a domain is C-M^t-C-Att to 3u)-C-\(1 to JVH-mI to Sj-C-^iD-C-s 
(4 to 48)-C-xi2VC. 

[1655] Proteins cuitentK kuivsn to include the C3HC4 domain tr« listed belcv t references an* onK provided for 
2S recently determined sequences). 

Mammalian V(D)J tetombmation activating piotein igene R^G'n RAG1 aoti^ate^ the teairangemsr-nt ot immu- 
noglobulin and T-ceii receptor genes. 

Mouse rpt-1 Rpt-i is a trans-acting factor that regulates gene evpiesston directed bv' the piomoter region of the 

30 mterleLikin-2 lecefAct alpha chairi 01 the !_! R promote! legion of HiV-1 

Human rfp. Rfp is a developmentally regulated protein that may function in male qerm cell development. Recom- 
bination of the N-termmal section of rfp with a protein tyrosine kinase produces the ret transforming protein. 
Humw 52 Kd Ro/SS-Apiotein 6 pa tern ot unknown function fiomthe Ro/SS-^ nLonucltopiotem v.oinpl^ Seia 
fa m patients with systemic lupus ei\thematosus ot pt unary c5 toy ten's svndioine often tont.tin antibodies th.jt ie.tot 

35 with the Ro proteins. 

Human histocompatibility locus protein RING1. 

Human PML a probable ^inscription fa^t^i >"htcmosomal translocation ot PML with retinoic teaiptoi alpha cre- 
ates a fusion protein which is the cause of acute promyelocyte leukemia (APIA 
Mammalian bieast cancer type 1 susceptiDilit) protein tBRCAn [Eij 
40 - Mammalian cb! proto-oncogene. 

Mammalian bmt-1 proto-oncogene. 

V^rteoiate CDk -activating kinus^ (CAM assembly factor MAT1 a protein that sst jbilizes the complex between the 
CDK? kinase and cvclirt H (MAT1 stands for 'Menage A Trots,'), 

MamiruiMn mel-18 prttein Mel- 18 which is e>piessed in .j ^ .tnetv of tumor telli is a tr.jnsctiption.ji lepiessorth.jt 

45 recognizes and bind a specific DNA sequence. 

Mammalian peioxis^me assembly tactoi~1 (PAF-t) (PMP35> which is somewnat involved in the biogenesis of 
peroxisomes In humans defects in PAF-- 1 are lei-ponsibie tot a foim of Zellweger syndrome, an autosomal re- 
cessive disorder associated with peroxisomal deficiencies. 
Human MAT1 protein, which interacts with the CDK7~cyc!m H complex. 

so - Human R1NG1 protein. 

Xent pus XNF i c totem a piot able tr.jnsctiption facto! 

Trypanosoma protein &SAG-8 1 1-LP) i vhi;h mav be involved in the pnstranscnptional tegulation of oeness in V5G 
expression sites ot may interact with adenylate cyclase to regulate its activity 

Drosophila pmtems Postenoi oe\ Combs (Psc) and Suppressoi two of ?este (Sufm;?) The two proteins belong 
ss to the Polvcomb cuoup of gun> j s ne^dud to maintain the seament-specific repies^ion of hum> j ottc selector gene*, 

Drosophila protein male-specific msl-'I a DNA-omding protein which ib involved in x chtomosome dosage com- 
pensation (the elevation ot transcription of the male single X chromosome), 
^tabidopsis thaliana protein r OF1 v\hk'h is irwclvtid in the teguLrlion of photonic tphogenc sis 
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- Fungal DNA repair ore-terns RAD5 R^Di6 RAD18 and tad8 

Herpesviruses trans-acting transcriptional protein fuP0/lE11G. This protein which has been characterized in many 
differs nt heipc svitusts is a frans- : iofivatoi ancf/ui -fepie^oi of th^e^iessicn of in : my viral jnd cellular (.lomot^i 1 . 
Baculoviruses protein CG30. 
s - Baculoviruses major immediate early protein tf'E 2d) 
Baailoviimei immediate pjtlv leguiatory piotein !(■: N 

- Ca^norh ihditis elf g ins. hypothetical protons F>4GK 4 R0SD3 4 md T02C1 1 

- Yeast Hypothetical proteins Y£P11oc and YKR017c 

to [1656] The (.enttal tecjion of the domain Vvas selected as a siqnatuie p : itiem foi the C3HC4 intent 

- Consensus pattern: C-x-H-x-[LIVMFY]-C-x{2)-C-[LIVlV!YA3 

[1657] [ 1] Borden K.L.8.. Freemont P.S. 
*s Curr. Opin. Struct Biol, 6:395-401(1996). 

[1658] 728 Ilirk tinge) C-^' -o-C-\3-H type. sand similar) 
[1659] Zirk fmp.ei CCHC clasi 

A family of CCHC ^inc fingers mobtl} from retro\nal gag proteins tnucleocaosid I FrotobjDe sttucture is fiom HIV 
Also contain*, member*. nvol\.<?d in eukaiyotic gene regulation, such as C elegans Gl.H-1 
so StRKture is an 18-residuc zinc finger no e t amplt;s of indtls in the alignment 
[1660] ?30. Zn-finger in Ran Dinding protein and others, 
[1661] 731. AN 1 -like Zinc finger 

[1662] Zinc finger at th* 1 > -teiminub of An ! Swiss G918«'"> a ubiquitin like pic tt-m in >Vnopus laevis "i he following 
pattern o^enbf s the jrinc finger C->2-r-< ( 0-12j-C->t1-2i-C-X4-C->2-H-<54-P-r Where < ^ in h« my *mmo arid' 
25 and numbers in brackets indicate the number of residues. 

[1663] (1 1 L trine n JM bailey C P Weeks DL Gene 1&93 V/S 181 ■ l 38 
[1664] 732 14-3-3 proteins 

Structure of a 14-3-3 protein and implications for coordination of multiple signalling pathways. 
Xiao b Smerdon SJ Jones DH Dodson GG Soneji V Aitken A Gamblin SJ Natute 1vto 13fcMU1 
JO Crystal structure of the zeta isoform of' the 14-J-3 protein. 

Liu D. Bienkowska J, Petosa C. Collier RJ, Fu H. Liddington R: 
Nature 1995:376:191-194. 

[1665] Interaction of t-KK with signaling pioteins ib mediated b\ th<= lecogntticn of pho^phoberine 
Muslin AJ. Tanner JW. Allen PM. Shaw AS: 
35 Cell 1996:84:889-897. 

[1666] 'i he !4-3-„ J . piotem binds ifc. taiget pioteins with a common site [orated towards the C-teiminus 
IchimuKiT I to M itagaki C Takahashi tVt Hongomc T Oniata ^ Ohno S li,obc T 
FEBS Lett 1997:413:273-276. 

[1667] Molecular exolution of the 14-3-3 ptotetn family 
■to Wang vV. shakes DC 

J Mo! Evoi 1996:43:384-398. 
Function of 14-3-3 proteins, 
tin Of" l.y>i MS iwakCA Jeang KT 
Matuie !9Pt-. 38.? 30*3-3i\-s 

■>s [1668] The 14-^-3 proteins [12 1] ate a family of closely related acidic homodimenc proteins of about 30 Kd which 
were tirst identified as being very abundant in mammalian brain tissues and located preferentially in neurons. The 
14-3-j proteins seem to have multiple biological activities and play a key role in signal transduction pathways and the 
cell ;vc h? They interacts with hnoses such as PKC or Raf-1; they seem to also function as protein-kinase dependent 
activators of tyrosine and trjotophan Hydroxylases and in plants they are associated with a complex that binds to the 

■io G-Lo^ piomotei element*. 

[1669] 'I he 14 3-5 familv of pioteins are ubiquitously found in all eukaryotic species studied and have been se- 
quenced in fungi (veust BMH1 and BMH2, fission yeast rad24 and rad25). plants. Drosophila. and vertebrates. The 
sequences of the 14-3-3 proteins are extremely well consented Two highly conserved regions have been selected as 
signatutt; f. aiUms ihe first is a peptide of 11 residues located in the !M-termina! section, the second, a 20 ammo acid 

55 region located in thu C -tormina! section. 

- Consensus pattern: R-N-L-[LIV]-S-[VGHGA]-Y-[KN]-M-[IVA] 

- r ensensus pattern Y-K-[DE3-S-T-L-l-[IM]-Q-L-[LF}-[RHCj-D-t>!-[LF]-T-[LS]-W-[TAN}-[SAD3 
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[ 1] Aitken A. 

Trends Biochem. Sci. 20:95-97(1 995). 
[ 2] Morrison D. 
Science 266:56-57(1994). 
s [ 3] Xiao B omeroon S J Jones D H Dodson G G Soncji t Aitken A Gamolin S J 

Nature 376: 188-1 91(1995), 

[1670] 73o D-tsomer specific 2-hyaroMacid d<?h>drogenases (2 Hacta DH' 

'! his Pfam covers the For mate dehydrogenase D-ojyoeiate dehvdrogt- nase and D- lactate dehydrogenase families in 
to bCOP A numbs i of NAD-d-ip^nd^nf z-hydioxyacid d^hvdrog^nasts v\hKh s^em k bt> specific for th-i D-isom-ii of 
trVif substrata hav.* bt^n t.ho^'n [12 2 4] to t»» functionally and structurally r> j l ited Th> j s> j -twii^ are listed b^low 

D ■ lactate dehydt og?n .t se t E:C 1 1 1 28) .j bacterial en^vmewhuh catalyses th? reduction of D-lact.tt<r to pymvak- 
D-glycerate Tehydtogenase (EC 1 1 1 29) (NADH-dependent hvTrovypyrtt^ate ieduct.is^j a plant leaf petoviso- 
*s mal enzyme that catal>z<?s tne reduction of hydroAypynwate to glycerate This reaction is part of the gl>co!ate 

pathway of photo res pi ration. 

0 (jlvx-tate dehydioyen.jsfr fiomthe bacfeiia Hvpht mioiobium methvlovorum and Methylotachrtium e^totqu^ns 
o-phosphoglycetate dehydrogenase (EC i i i 95> a bacterial tnz\ me that catalyzes tne oxidation of D-3-ohOb- 
phoyiy<.eiateto 3-phosphohydio\ypyiuvate This reaction is the first committed step in the 'phosphor ylated path- 
so way of serine biosynthesis. 

Erythronate-4-phosphate dehydrogenase {EC 1.1.1, -Xpene pdxB). a bacterial enzyme involved in the biosynthesis 
or pyridoxme (vitamin B&l 

D-2-hydroxyisocaproate dehydrogenase {EC 1.1.1.-) (D-hicDH), a bacterial enzyme that catalyzes the reversible 
arid serospecific i nit i conversion between 2-ketocarhovyitc acids and DO-hydroxy-oarbo^ylic acids 
25 . Formate dehydrogenase (EC 1.^.1.25 (FDH) from the bacteria Pseudomonas sp. 101 and various fungi [5]. 

Vancomvcin lei-istanc*. piotein vanH tiom [inter ococous faetium this protein is a D specific alpha keto acid de- 
li) diogenaihe involved in the formation ot a psr-ptidogiycan whu h (!<..<?*, not feiminafe t > D-a!anme thus- pisr-vsr-ntina 
vancomycin binding. 

Escherichia colt hypothetical protein ycdVV 
30 . Escherichia coll hypothetical protein yiab. 

Haemophilus influenzae hypothetical protein HI1556, 

- Yeast hypothetical protein YER081w. 
Yeast hvpothetical protein YiL074w. 

35 [1671] All thes- 1 un^vmes ha>/e similar -Twiiatio activities arid an* siriiciuially telated Thr^e of the most conserved 
legions of these pmtetns have been selected to develop patterns The fiist pattern is ba^-ed on a ylycme-nch ruction 
located iri Hu : ' central sei-tion <f th^st; ^nzym^ this teqion cictbably coriespc nds to the NAC-binding domain Thety,'0 
othei patterns contain 3 numhei of < onsf-rwd chaig<-o residues soiw ot otiich rrny pl^y a rok in the C3tahtn mech- 
anism. 

- Consensus pattern: [LiVMAHAG]-[lV1l-[L!VMFY}-[AG3-x-G-[NHKRQGSAC3-{ , LtV]-G-x(13.14)-[L!VfMTj-x(2>-[ , PY- 
wCTH]-[DNSTK] 

- Consensus pattern [LIVMF WA3-[Lt\,F/WCH(2HSAOHDNQHR3"[IVFA3-[LtVF3~v-[LiVF3~[HNtj^P-\(4)"[STN3- 
x(2HLIVMF]-x--[GSDN3 

4S - Consensus pattern: [LMFATCHKPQ3-x^GSTDN3-x-[LiVMFYWR3^L!VMFYW3(2V-N--x-[STAGC3--R--[GP)-x--EL!VH3-- 
[LIVMC]~[DNV] 

P3Gi=intGA Bicoh-irn Bicphvs R<^ >"onimun 1^5 k< v l-k>74(1 9tf0 > 

[23 Kochhar 3 Hunziker P Leong-Morgenthaler PM HottmgerH Biochem Biopnys Pes Commun 154 00-Ob 
so (1992). 

[3] Oht.j "! TaguchiH J Biol C hem 266 12688-1 2694(19^1 ) 

[43 Goldberg I D Yoshida 1 , Bnch P J Mol Biol 2^6 1 t23-1140 t 1094) 

[5j Popov V.O. Lamzm V.S. Biochem. J. 301:625-643(1094). 

ss [1672] 734 2-oao acid d.^iydroa^nases a<-yltransl.=>fas^ (catalytic dom nn> 

Refined crystal structure of the catalytic domain of dihysroitpoyi transacetylase (E2P) ttom azotobacter\inee!andn at 
2.6 angstroms resolution. 

Mattevi A. Obmolova G. Kaik KH. Westphal AH. De Kok A. Hoi WG: 
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■J Mot Bio! 1993;230:1183-1199. 

These proteins contain one to three copies of a iipoy! binding domain followed bv the catalytic domain. 
[1673] 735 3-bt;t?i h>diG>ybteriod di : 'hydroy« nase/isomtiase family 
Structure ana tissue-specific expression of 3 
* beta- hydros vstoroio dehydrogonase'5-ene 4 one tsomcrasc genes in human and rat classical and peripheral ster- 
oidogenic tissues. 

Labn.»F Sunatd I Lnu-ThtA P^llutmr G Belong*! A 
Laebance Y Zhao HF. labrie C, Breton N. de Launoit Y, et a! 
i Steroid Pioohem Moi Biol 199? 41411 435 
10 The enzyme 3 beta-hydro Kyst^K id d€;hydrogends.'i/5-f a n«-4-ern a ib< m-iras^ 13 Ma-HSDI o : it : ily.:es the omdstion and 
isutwnzarion of 5-<*nt;-3 b-Ha-hyofOAvpr-Kjneru* ana jnrirusterii* ^toid precursors into thi* cor rt.^ pond- 

ing 4-ene-ketosteroids necessarv for the formation ot all classes of steroid hormones. 
[1674] ' 36 3-hydto>>3cyt C <. & dehvdioyenase 
This family also includes lambda crystafhn. 
Structure of L-3-hydroxyacyl<cen;:\ me A dehydrogenase 
preliminary chain tracing at 2.8-A resolution. 
Birktoft JJ. Hoiden HM. Hamlin R. Xuong NH. Banaszak LJ; 
Ptoc Natl Acao Set USA 84 8162-8266 

[1675] ? h\dio*yao\l-CoAdehvdroaenaseiFC 1 i i ?5nHCDHi [1] is an enzyme irwoh'ed in tarty J metabolism 
J- it catalyzes th« induction of 1-hydn.tvyai.yl-*" c>A to3-o*o?Kyl-OoA Most ^uka^c tic c-illh have 2 fatty-acid beta-o tdation 
systems on^ located in mit<xhf , ndtia.jndtheotnerm pero> monies In ceroM somes 3-hydroyacyl-CoAd^hvTrogenase 
foims with eno\l-CoAnydratase tECHjand 3 2-trans-enoyi-CoA isomerase lECHa multifunctional enzyme where the 
N-tfeiminal domain btars the hidiatase isomwase activities and the C-teimmal domain th*> dehidiog^nast- ^tivitv 
There are two mitochondria! enzymes: one which is monofuncttonal and the other which is. like its peroxisomal coun- 
ts terpart. multifunctional. 

[1676] In Cisoherkhta coli (o.enefadk) and Pseudomonas fiagi (aenefaoA) HCDH is part of a multifunctional enrrvme 
whit h also -x ntams an EC H/FCI domain a?- well as, a 3-h>dK<* vbtiivt v!-0>A epimetase domain ]?.] 
[1677] Th.» other ptot* ins stiuftnralK r<= lat-Ki to HCDH ar.> 

JO - BacUnal >-hydioxybufyiyl-> oA dehydrogenase (tC ! ! ! Vvhich reduces o-hvdro<>tijtarioyl-» oA to jscti- 
toacstyl-CoA [ 3j . 

Eye lens protein lambda-crystalhn (4). which is specific to lagomorphes (such as rabbit). 

There .jie two ma|ot legion of sitniLtnties in the sequences tf ptotems of the HCDH family the first one loctted in the 
-s* N-terminal eotrt^ponos to the NAQ-bmding sit- 1 thi* se oond one is located m tht fun^r of the s> j quen^e A signature 
pattern has been derived from this central region. 

- Consensus pattern: [DNE3-xf2H<^>Ht!VMFYfx-[NT3-R.x{3HPAHLtVMFYK2)~x(5HLtVMFYCTHLIVMFY}.x 
(2HGV] 

[ 1 j Btrkfoff J J HoloenHM Hamlin k v uong N -H Banaszak L ! Proc Natl Acad So U -> A 84 8^62-8^66 
{19871 

[ 2j Nakahigasht K , Inokuohi H Nucleic Acids Res !8 4&3V 493;(1990i 

[ 3] Mullany P ClavknCI. Pallet i M J 5>k ne R Al-S.tleh A labaqchali S FIX MS Miciobwi Lett 
45 (1994). 

[4j Mulders J.W.M.. Hendnks W.. Blankesteiin W.M.. Bloemendai H.. de Jong WW. J. Biol. Chem. 263: 
15462- 15466) 1988). 

£1678]| 737. 60s Acidic nbosomal protein 
■io Piotems P1 and PO tom^ont-nts ot th^ euk^ivoti^ 

nbosome stalk. New structural and functional aspects. 

Remacha M, Jimenez-Diaz A. Santos G, Briones E. Zambrano R. 

Rodriguez Gabne! MA. Guannos E, Ballesta JP: 

Biochem Cell Biol 199573:95'9--968. 
55 This family ir^lud^s aich ieb;<ctenal L12 mil. an otic PO P1 and P2 

[1679] 73b 6-phosphogiuonate dehydiogenases 

f>-pho«phO(.ilucon^t<r deh\dK jenas*> (FC I 1 1 44) toPGDt (.atalvres the third ^ft-p m the hexose mont phosphate 
shunt rtu : ' de^arboxylating teduction ot6-phosphoglueon : iU in to nbulose 5-ptiOi,( h : iU 
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[1680] Prokaryoticandeukaryotic6PGDare proteins of about 470 amino actds whose sequence are highly conserved 
[1]. A region which has been shown [2], from studies of the sheep 6PGD tertiary structure, to be involved m the binding 
of 6-phosphogluconate has been selected as a signature pattern 

s - Consensus pattern; [L!VM]-x-D-x(2HGAHNQS]-K-G-T-G-x~W 

[ 1j Reizer A : Deutsche J.. Sater M.H. Jr., Reizer J 
Moi. Microbiol 5:1081-1089(1991). 

[ 2} Adams M.J . Archibald I.G., Bugg C.E.. Came A . Cover S.. 
10 Helliweil J.R., Pickersgill R.W., White S.W. 

EMBO J, 2:1009-1014(1983), 

[1681] /39 (Vim 1 ! G- protein coupled receptors [1 to 4, Ev1,tv.2]i also called R?Q) are an extensive group of hormones, 
neurotransmitters, odorants and light receptors which transduce extracellular signals by interaction with guanine nu- 
*s cleotide-binding (G) proteins. The receptors that are currently known to belong to this family are listed below. 

- 5-hydrcxytiyptamme (seFotonin) 1A to 1f : , 2A to 2C, 4 : 5A, 5B. 6 and 7 [5]. 
Acetylcholine, muscannic-type, M1 to M5, 

- Adenosine A1 . A2A, A2B and A3 [6], 

20 - Adrenergic alpha-1A to -1C: £ilpha-2A to -2D: beta-1 to -3 [7] 
Angiotensin II types I and II, 
Bombesin subtypes 3 and 4 

- Brady km m Bl and 82 

o3a arid C5a anaphyiatoxin 
25 - Cannabinoid CB1 and C82, 

- Chemokines C-C CC-CKR-1 to CC-CKR 

- Chemokines C-X-G CXC-CKR-1 to C*C-CKR-4. 
Choiecyslokinin-A and cholecystokiniri-8/gaslnn, 

- Dopamine D1 to 05 [8j, 

30 - Endothiilin ET-a and El"-b [9j. 

- fMet-Leu-Phe ffMLP) (N-formyl peptide). 

- Follicle stimulating hormone (FSH-R) [10], 

- Galanin. 

Gastrin -releasing, peptide (GRP-Ri 
35 - Gonadoiropin-K-leasing hormone- tGNRH-R). 
Histamine H1 and H2 (gastric receptor t). 
Lutropm-chonogonadotropic hormone (LSH-R) [10]. 

- Melanocorttn MC1R to MC5R. 
Melatonin. 

40 - Neuromedin B (NMB-R). 

- Neuromedin Kt'NK-3R). 
Neuropeptide V types I to 6. 
Neurotensin (NT-R). 
Octopamme (tyiaminei, fiom insects 

45 - Odorants [11] 

Opioids delta- kappa- and mu-types [12] 

- Oxytocin (QT-R), 

- Plalelet ;k tivattng factor ( PAF-R i 
Prostacyclin. 

so - Prostaglandin D2. 

- Piostaglandin E2, Iv.P 1 to EP4 subtypes 
Piostagiandin F2 

- Punnoreceptors(ATP)j;i3]. 
Somatostatin types 1 to £ 

ss - Subsianoe-K (NK-2R t 

- Substance-P (NK-1R) 
Thrombin. 
Thromboxane A2. 
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- Thyrotropin (TSH-R}[1 0]. 
Thyrotropin releasing factor (TRH-R ). 

- Vasopressin Via, V1 b and V2. 

Visual pigments {opsins and rhocfopsin) [14]. 
s - Proto-oncogene mas. 

A number of orphan receptors (whose ligand ts not known) from mammals and birds. 

- Caenorhabditis elegans putative receptors C06G4 5. C38C10 1 . C43C3.2, T27D1 .3 and ZCS4.4. 
Three putative receptors encoded in the genome of cytomegalovirus; US27, US28, and UL33. 
ECRF3. a putative receptor encoded in the genome of herpesvirus saimiri 

10 

[1682] The structure of ail these receptors is thought to be identical. They have seven hydrophobic regions, each of 
which most probably spans the membrane The N-ter minus ts located on the extracellular side of the membrane and 
is often glycosylated, while the C-termintts ts cytoplasmic and generally phosphorylated Three extracellular loops 
alternate with three intracellular loops to link the seven transmembrane regions. Most, but not all of these receptors. 
»5 lack a signal peptide. The most conserved parts of these proteins are the transmembrane regions and the first two 
cytoplasmic loops. A conserved acidtc-Arg-aromatic triplet is present in the N-terminal extremity of the second cyto- 
plasmic loop [15] and could be implicated in the interaction with G proteins. 

[16S3] To detect this widespread family of proteins, a pattern that contains the conserved triplet and that also spans 
the major part of the third transmembrane helix has been developed, 

- Consensus pattern: [GSTAUVMFYWCHGSTANCPDEHEDPKRH}-x{2HLfVMNQGA3-x{2HLfVMFTHGSTANC3- 

[L!VMFYWSTACHD£NH]-R-[FYWCSH]-x(2HLiVM3 

[ 13Strosberg A.D. 
25 Eur: J. Biochem. 196:1-10(1991). 

[2] Kerlavage A.R. 

Curt Optn Shuct Biol ! 394-401 ( 1P9 ! i 

[ :] Probst IV briyder L A Schuster D I Brostus J S^alfon S.C. 

DMA CellBtoi 11 1 -20(19^ 
30 [4] Sa^arw ! U Fras^t f M 

Biochem J 28? 1-9(19921 

[ 5] Branchek T. 

Curt Biol. 3:315-317(1993). 

[61 Stiles G.L 
35 J. Biol. Chem. 267:8451-6454(1992). 

[ ' j Knell r hobilka b K I efkowtt? R J Oaron M G 

Trends N^uro^ci 1 1 321-3241 108R1 

[ 83 Stevens C.F. 

Ctm Biol 1 :0-22tly91) 
40 { 9] Sakurai T. Yanagtsawa M.. Masaki T. 

I tends Pharmacol Set K> 103-10/f 199zt 

[10] Salesse R.„ Remy J.J.. Levin J.M.. Jallal 8.. Gamier J. 

Biochimie 73:109-120(1991) 

[11] Lancet D.. Ben-Arie N. 
4S Curr. Biol. 3:668-674(19931 

[1 2] Uhl G R Cnildeis S , Fastemal- G 

Trends Neurosci. 17:88-93(1994). 

[k<] Barnard E A Burnsfook G Webb T E 

Trends Pnarmacol Set 15 67-700^94) 
so [14] Applebury M.L.. Hargrave P. A. 

Vision Pes ^o188! 1«'^(198oi 

[16] Attvood 1" h Bliopouhs EE Findlay J B C 

Gene y8: 153-1 59(1 991). 

ss [1684] rtm 1 1 Visual pigin* nts (opsins t retina! binding site 

Visual pigments [t 2} are the light-absorbing molecules that mediate vision. They consist of an apoprotein, opsin, 
covalently linked to the chromophore cis-retmal. Vision is effected through the absorption of a photon by cis-retinai 
v\hich ii, isornet i^'d k ti^n^-r^tmal This t&iimeniistiun kc3ds to a change of conformation of the protein. Opsins ate 
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integral membrane proteins with seven transmembrane regions that belong to family 1 of G-protem coupled receptors 
[1685] In vertebrates four different pigments are generally found Rod cells, which mediate vision in dim light, contain 
the pigment rhodopsm. Cone cells, which function in bright light, are responsible for color vision and contain three or 
more color pigments {for example, in mammals: red, blue and green), 
s [1686j iri Drosophila, the eye is composed of 800 facets or ommatidia. Each ommatidium contains eight photore- 
ceptor cells (R1-R8) the R1 to R6 cells, are outer cells, R? and R8 inner cells. C.ach of the three types, of cells, {R1-R6. 
R7 arid R8) expresses a specific opsin. 

[1687] Proteins evolutionary related to opsins include squid retinochrome. also Known as retinal photoisomerase. 
which converts various isomers of retinal into 11-cis retinal and mammalian retina! pigment epithelium (RPE) RGR [3], 
to a protein that may also act in retinal isomertzation. 

[1688] The attachment site for retinal in the above proteins is a conserved lysine residue in the middle of the seventh 
transmembrane helix. The pattern that had been developed includes this residue 

- Consensus pattern [LIVMWAC3-iPGC]-x(3)-[SAC]-K-iSTALIMRHG3ACPNV3-[STACP]-x(2HDENFHAP]-><(2}- 
15 [iY] 

[K is the retinal binding site] 

[ 1j Appiebury M.L., Hargrave P.A. 
Vision Res. 26:1881-1895(1986). 
20 [ 2] Fryxell K.J.. Meyerawrtz E.M. 

J. Moi. Evol. 33:387-378(1991). 

[ 3] Shen D„ Jiang M., Hao W., Tao L, Salazar M.. Fong H.K.W. 
Biochemistry 33: 1311 7-1 31 25(1 994). 

25 [16893 The following descriptions of protein family functions are not provided by the Pfam or Prosite databases. 
[16903 '''' 4f -5 BAH 

BAH domain. Number of members: 65 

[1] Medline: 97074877. Molecular cloning of polybromo, a nuclear protein containing multiple domains including 
30 jive brornodornains. a truncated HMG-bo:<, and two repeats of a novel domain Nicolas RH. Goodwin GH: Gene 

1998;175:233-240. 

[2j Medline: 99198739 The BAH (bromo-adjacent homology) domain: a link between DMA methylatlon, replication 
and transcriptional regulation. Callebaut I. Courvalin J-C, Mornon JP: FEBS lefts 1999:446:189-193. 

35 [1691] 741. ELM2. 

ELM2 domain. The ELM2 (Egi-27 and MTA1 homology 2) domain is a small domain of unknown function. Number of 
members: 10 

[16923 ? 42. EuK prom. £UKARYOTlC_PORI!\l The major protein of the outer mitochondria! membrane of euKaryotes 
is a ponn that forms a voltage-dependent an ion-selective channel (VDAC't that behaves as a general diffusion pore 
40 for small hydrophiiic molecules [1 to 4] The channel adopts an open conformation at low or zero membrane potential 
and a closed conformation at potentials above 30-40 rnV. 

This protein contains about 280 amino acids and its sequence is composed of between 12 to 16 beta-strands that span 
the mitochondrial outer membrane. Yeast contains two members of this family {genes POR1 and POR2), vertebrates 
have at least three members (genes VDAC1, VDAC2 and VDAC3) (5j. 
45 A conserved region located at the C-terrninai part of these proteins was selected as a signature pattern. 
Consensus pattern[YH]-x(2)-D-[SPCAD]-x-^ 
(LiVMY) 

[ l] Benz R Biochim. Biophys Acta 1197.167-196(1994) 
so [ 2) Manella C A. Trends Bioehem. Sci. 17:315-320(1992). 

[ 3] Dihanich M Experientia 48:146-153(1990). 

[43 Forte M., Guy H.R., Mannella C.A, J, Bioenerg. Biomembr. 19:341-350(1987), 

[ 5] Sampson M.J.. Lovell R.S., Davison D.B., Craigen W.J. Genomics 36:192-196(1996). 

ss [1693] 743. Glyco hydot 19 
Chitinases family 19 signatures 

cross-reference(s) CHITINASE...1&..1. CHIT1NASE..1&.2 

Chitinases ( EC 3 2 1 1 4 1 [1 } are enzymes that catalyze the hydrolysis of the beta-1 ,4-N-acetyl-D-glucosamme linkages 
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in chftin polymers. From the view point of sequence similarity chitinases belong to either tamfiy 18 or 19 in the classi- 
fication t f piycob^ I hydrolases [2 E !] Uiitmab^s of family 19 tals*. known as da&SfS IA of I and IB oi III at<= t-nzyines 
from plants that furn.ticn in the defense against fungal and insect pathogens by destroying their chittn-eontainmg c^ll 
wall. Class I A/ 1 and IB/11 enzymes differ in the presence flA/l) or absence (IB/lh of a N-termtna! chitm-binding domain 
s (see- the relevant entry -PDOC00023-") The catalytic domain of these enzymes consist of about 220 to 230 amino 
acid residues 

Tvio hiahly consumed mqions wr<! ssjieoisjo as sign ituru p litems. th> j first one is looted in the N-tennmal suction 
and contains one of the six cysteines which are conserved in most, if not all. of these chitinases and which is probably 
involved in a disulfide bond. 
10 Consensus patternC-x(4.5)-F-Y-[ST]-x{3)-[FYHL!VMF3-x-A-x{3)-[YF]-x(2)-F-[GSA] 
[1694] Cun^nsus patt.»rn[LIVM]-EC-.bA]-F-A-[bTAG]{2)-[LIVMF' I ]AV-[F t }-\N-[U\ Mj 

[1]FiachJ PiletP-F. Jolles P r-^peiienti.j 48 "01 -net 19-32) 
[2]Henifes-jtB eioihwn I 280 500-316(19915 

?5 

[1695] 744. MBD 
Methyl-CpG binding domain 

The Metnyl-CpG binding domain (MBD> btnos to ON A tnat contains one ot more symmetrically methylated CpGs [I] 
DNA methylation in animals is associated with alterations in chromatin structure and silencing of gene expression, 
20 MBD has. negligible non-speeittc affinity tor DNA In vitro fool-( nntmo. with MeCP2 showed tht MBD can (.rolt^t a 12 
nucleotide region surrounding a methyl CpG pair [1], MBDs are found in several Methyl-CpG binding proteins and also 
DNA demethylase [2]. Number of members: 11 

[1]Mt*dline °42^2b13 Dissection of the iwthyl-CpG binding domain from the (.hromosoinai protein MeCF2 Nan 
25 X Meehan RR Bird A Nucleic Acids Res 1003 21 

[2|MeJiine 1M*1 , J .8 A mammalian piotetn with spectfk demethylase activity for m€pG [IN A bhattachai> a SK 
Ramchandanib CeivomN S^yfM Nature 1P9« 397 579-W3 

[1696] 745. Peptidase C1 
30 Eukaryotic thiol (cysteine) proteases active sites 

rro^s-tefeien^s) THlOL_.PPOTEASE_.CY3 THIOL„PPOT£ASE_.H!S 
TH!OL„PROTEASE„ASN 

Eukai^tn. thiol ptottabtfS (EC 3 A 22 -) [I] aic a family of pattolytn, t-nz/mes which xntam an a.:tKf? site ^ysttine 
Catalysts, prtoeeds through j thk e* tet interm?di.tte .tnd ii faulted hx a ne.jiby hii,tidine iide chain an rtsparagine 
35 complete thi* sis^^nttalcatalvtic-tn^o The protc as.is which are currently kiov\n to belong to this family ;<ro listed b> j !os*' 
(references are only provided for recently determined sequences), 

- \ ertf-brat^ lysosomal cathepsins B (D~ 3 4 22 1) H (CO 3 4 22 Km L (EC 5 4 22 15) ano S . EC 3 4 22 2" 7 > [2] 
Vertebrate lysosomal dipeptid\l peptidase I (EC 4 14 iifaiso knoy.n as catnepsm Ci [2] 

40 - Vertebrate calpains (EC 3.4.22.17). Calpams are intracellular calcium- activated thiol protease that contain both 
a N-termma! catalytic domain and a C-termina! calcium-binding domain. 
Mammalian cathepsin K. which seems involved in osteoclastic bone resorption [3]. 

- Human cathepsin O [4]. 

Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug BLM (a glycopeptide). 
4S - Plants rhymes b irley alteram (EC 3 4 22 16 1 EP-B1 E4 kidney h«an EP-C1 ru.e b.^an ^H-EP kwt fiuiifctmidin 
(EC 3 4 22 14} papaj a latev papain (EG 3 4 2^ 2> chymopapain (EG 3 4 22 6> caricam (EC 3 4 2 1 3u) and pro- 
teinase IV (EC 3 4 2>2cii pea tuigor responsive protein 16A pineapple stem bromelain ft.C , } -4 2'. 32) tape 
COT44 m.eciy" : iin alpha teta andgamma tomato low-temp^iatutt; induced Aiabtdof sisthaliana A494 RD19A 
and RD21A. 

so - House-dust mites allergens DerP1 and EurM1. 

C .tthepsm B4ike proteinases fiom the worms. Caenoih jbditis elegans icenes gep t, ofi-3 cpr 4 cpi-6 .tnd cpr 

d) Schistosoma mansoni i antigen SM31 > and Japonic a (antigen 3J31 » Haemonchus contortus t genes AC-1 and 

AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

Slime mold cysteine proteinases CP! and CP,?., 
ss . Cru^ipain tioin Trypanosoma ciuzi and bruct-t 

Throphozotte cysteine proteinase (TOP I from \/aiious Plasmodium species 

Proteases from Leishmama mexicana, Theilena annulata and Theilena parva. 

BacLilovtruses cathepsin-ltks Enzyme (v-cath). 
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Drosophtia small optic bbes piotem (gene soh a neutonal protein tnat contains a caipain-like domain 

- > east thiol prott-a^ BLH1/YCP 1 'LAPS 

*" : ienoEhahditis ^leyan 4 . hyf othe itcat protein O06G4 2 a Jlpain-liki : ' pick in 

s [1697j Tvu ° bacteria! peptidases are also part of this family 

Amirmptptioasi; C from Lackiooofus lactis (g<= ne pepC t [Sj 
Thiol protease tpr from Porphyromonas gmgivaiis. 

to [1698] Thr^e ofhei proteins are stuKturall) related to this family but may haw io&t their proteolytic adiMty 

Soybt-an oil body pattin P 3^1 This piotem hat. fo =^me sit* 1 cystine replaced b\ '< y!\une 
Rat testin d ■Sertoli cell iecietoi\ f totem highly sunilai k tathepsm !. but with the active site ovsk-ine is repLtoed 
by a serine. Rat testin should not be confused with mouse testin which is a UM-domam protein (see 
*5 <PDOC00382>). 

Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 111 Kd pos- 
sesses a C tetmin.il thiol- pi otejse- like dot rum (6], but the aitive site o,steine is replaced by a ierme 

The seouenres aiound the thiee active site lestduei. are well conserved and <.an be used as siynatuie pattern::. 
20 [1699] Consensus pat1t:inQ-<i3)-[GE]-*-C-[ AVj-xi^HSTAG^HRTA^CV] [c is, tti.* adivi site it sidue] 

Note the residue in position 4 of the pattern is almost always cysteine: the only exceptions are calpatns(Leu), bleomycin 
rndtoiase (Seri and \eabt rCPt t Set j Note the residue in position 5 of tne pattern is always Giy except m papaya 
protease IV where it is Glu. Consensus pattern[LIVWiGSTAN3-x-H-[GSACEHLIVWi]-x-[LfVMAT]{2)-G-x-[GSADNH3 [H 
is the active site residue] 

25 Consensus pattern[FYCHHWIHHVT]-xHXR^^ [N 
is the active site residue] 

Note these Mein^ belong k tamilv i. 1 yjapam-type} and C 2 (calpams) in ihe classification c f peptidases [7 E 1] 

[ ijOntourE biochimie 7u t B3G- 1 342< 193e) 
30 [ ^iMtschke H BauettA ! Kawlinjs, N 12 Protein Ptcf 2 1WV-1 643(1 O^i 

pjShiG-P Chapman HA Shunt S M Deleeuw C ReddyVY Wets*. S J FEBS Lett %~ 129-1 "i4( I995' 
H]\eiascoG , FerrandoAA Puente x S Sanchez L M lopez-Otin C J Biol Chem 26C* 27 136-27 14 2(1 
[ SjChapot-Chartier M.P.. Nardi M.. Chopin M.C., Chopin A.. Gnpon J.C. Appi. Environ. Microbiol. 59:330-333 
(1993). 

35 [6|HicjoinsDG M^ormeU D J Sharp P M Nature- ^40 o04-^04. 1Q8«) 

[ 7)Raw1ings N D.. Barrett A.J, Meth. Enzymoi, 244:461-486(1994), 

[1700] 746. Peptidase M22 

Glycoprotease tamilv signature cross-ieferenceis) GLYCOPROTEASE 
40 GlycopiotfVse (GCP) (EC 3 4 24 £"7 1 [ t] or o-m aloykcopiotein endopf pttcteae ib a metalk prott-a^ ousted by Pab- 
teuiella tiaeiiKl>tica v\hk'h spenficfslly okaws 0-sialoq!>cof.ioiems such as q!>cof.horm A fh« sequence' oi :-<CP is 
highly similar to the following Lin characterized proteins: 

Fschenchia coll hypothetical protein \0|D iORF- \i 
45 - Bacillus subtflts hypothetical protein ydiE. 

Mycobacterium leprae hypotnetical ptotein U2';9£ 
Mycobacterium tuberculosis hypothetical protein MtCr78.lO. 

- Syne>-hoeystis stt=nn FCC WQ~\ hypothetical protein t,!r0807 
Methanococcus jannaschii hypothetical protein MJ1120 

so - Haloarcula mansmortui hypothetical protein in HbH Sxegion. 
>'e.jst hvpothetkal piotem YKR03«c 
V'east hvpothettca! protein OR!7. 

[1701] One ot the conser\'ed regions contains too conserved histidihes It is possible that this region is involved in 
55 coordinating a met i! ion such is ninr 

[1702] ConsenbUb pattern[KR]-[GSATj- A (4)-[F , i vyLH3~[DONGK]-A-P-\-[LIVMFY3-x(3)-H-x(2)-[AG]-H-[LIVIVI3 
Note these proteins belong to family M22 in the classification of peptidases [2,E1 j. 
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[ l] Abdullah K W LoRYC Meilors A J bactenol 173 3597-5e03t 
[2]Rawlings N.O.. Barrett A.J, Meth. Enzymof. 248:183-228(1995). 

[1703] "M~ SAM SAM oomain ( SteniV -jlph-j motif) 
s It has ocen suggested tnat SAM is an evolutionary conserved protein baaing oomain that is involved in the regulation 
of nmrwruit de^elor. mental pioces.ses in diverge euhjrvotps. The SAM dunam can pttentialH function as a prttein 
interaction modulf through its ability to homo- ;<nd h-Heroolicionmrisf with other PAM domains Number of members b1 

[IjMedime: 9810065& SAM: A novel motit in yeast sterile alpha and Drosophita potyhomeotic proteins Ponting CP: 
10 Prat Sci 1995:4:1928-1930. 

[2]Med!ine 9~1^040b bAM ;<s ;< protein mierac tir>n domain involved in auvulupmuntai tegulftion Shufc: t Fontina 
CP Hotrmnrt K Boil' P Prot St.! 10£~ o 249-253 

[3]Medline: 99101382 The crystal structure of an Eph receptor SAM domain reveals a mechanism for modular 
dimenzation Reference ^uthot StapM^n D Bolan I Pawion T Sichen F Nut Stiutt Bio! 1099 6 44-49 

?5 

[1704] 748 ■l>iosin^sestynatu^bCioss-r t -f^renc«^i"lYR'->SlNASE_1 "l YROSlNASE_2 Tyrosinase (EC 1 14 18 H 
j t) is .t i-oppei monoo*ygenrtses th.it catalv.Testhe hydtox\l.jtion of monophent is and the oxidation of <. -diphenols to 
o-quinols This enzyme found tn pro! aryotes as well as in eukaryoteb is in\ol\ed in the formation of pigments such 
as melamna and other polyphenols compounds, 
so [1705] Tyrosinase binds two >'oppt;i ions (CuA and CuB) Each of ttn» tv\o co( pe t ion has been shown [2] 1o be bound 
bv three conserved histtdtnes residue? The legions around the^e coprer-biming ligands ute well conseiy^i am also 
shated b\ some hemocyanins whicn ate cooper-containing o\\gen earners from the nemoiymph of man\ molluscs 
and arthropods [3.4]. 

[1706] At !>=• ist two proteins ml;4t;0 to tvtosinas* ait knuv\n to exist in mammals 

- l'RP-1 {"! YPP1 ) jfij, \>.hirh is lespons-ihle for the conversion of t fr-dthydio\yindole-:-<.aibo\yiic acid iDHKA)to 
indole-5.6-quinone-^-carboxyiic acid. 

- TRP-2 (T\ RP21 [R] which is th« m-*! moaenu. tnzyint! QOPAchronm tautomtia^ ( EC 5 3 3 12t that cat j ly.it! s 
the com ersion ot DOPAchioiTfc to DH1CA TRP-2 diffeis from tj rostnases and TRP- 1 tn that tt binds two iitnc tons 

30 tnstead of copper ( / j. 

[1707| Other proteins that belong to this family are: 

Plants polyphenol oxidases. iPPO) (EC 1.10.3.1} which catalyze the oxidation of mono- and o-diphenols to o- 
35 diqutnones [8]. 

Caenorhabditis elegans hypothetical protein C02C2. 1 

[1708] Two signature patterns for tyrosinase 3nd related proteins have been derived The first one contains two of 
the histidines that bind CuA. and ts located in the N-tenninal section of tyrosinase The second pattern contains a 
40 histidine that binds CuB, that pattern is located in the central section of the enzyme. 

Consensus pattern H-x(4,5}-F-[LIVMFTP]-x-lFW] -H-R-x(2)-[LM]-;<(3)-E [The two H's are copper iigandsj 
[1709] Consensus patternD-P-x-F-[LIVMFYW]-x(2}-H-x(3}-D [H is a copper ligand] 

[ IjL.erch K. Prog Clm Biol. Res, 256-85--98n98S) 
4S [ 23Jackman MP., Hajna! A., Lerch K. Biochem. J. 274:707-713(1991). 

[3jLinzen B. Naturwissenschaften 76:208-211(1989). 

[4jl..ang W.H.. van Hold? KE Proc Mat! Acad. Sci U S.A 88.244-248(1991). 

[ 5]Kobayashi T.. Urabe K . Winder A , Jimenez-Cervantes C . Imokawa G. Brewington T., Solano F.. Garaa- 
Borron J.C.. Hearing V.J. EM BO J. 13:5818-5825(1994). 
so [ 8]Jackson I.J., Chambers D.M., Tsukamoto K., Copeland M.G., Gilbert D.J., Jenkins N.A., Hearing V. EM BO J. 

11:527-535(1 992). 

[ 7]So!ano F.. Martinez-Liarte J H., Jimenez-Cervantes C . Garcia-Borron J.C . Lozano J A Biochem Biophys 
Res. Commun. 204:1243-1250(1994). 

[ S]Cary J.W., Lax A.R., Flurkey WH. Plant Mol, Biol. 20:245-253(1992). 

ss 

[1710] 749. (Mur Ligasej Folylpoiyglutamate synthase signatures 

Folylpolyglutamate synthase (EC 6 3 2 17}(FPGS) [i] is the enzyme of folate metabolism that catalyzes ATF'-depend- 
ent addition of glutarnate moieties to tetrahydrofolate. 
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[1711] its seo« fence is moderately conserved between ptokaiv'otes (gene foiC^ and enl.arjotes We developed t*o 
siynatui* pattfins ba^d on thf v.onstrv^d tenion^ v,hi<.h at<= nth in glycine tesidues and could play a role tn the 
catalyticai activity and/or m substrate binding. 
Description of pattern! s) and/or profile(S) 
s Consensus pattem[LIVMFY]-x-[LiVMHSTAG]-G-HNK].G-K-x.[ST3-xt7HLIVM](2i-xt3H<3SK3 
Consensus pattem[LIVMFYK2)-£-x-G-{LiVMHGA}-6-x{2V-!>-x-[GST}-x-[LIVM]f2i 

[1712] [1]Shanee GaiwvT Bonnet A Chf-n L Chon J Hsu tC StovurF Adv Evp Med Eio! C29-B34 
(1993). 

[1713] 750 (Peptidase tW-) Neutial ?inc metallopeptidases, ^inc-binding legion signature 
to The inajonly of mru'-dep -inderU rnt-lfjllopt-ptidt3s.es. iwith the notable tveption of th-i catbonypeplidases) share a com- 
mon pattern of primary structure [1 .2.3] m the part of their sequence involved in the binding of sine, and can be grouped 
togethei asasuperfamtly knowna&tlhs metzincins onthf basis, ot this sequence similantv Thev -=*n b<= cl3Sbifi<=d into 
a number of distinct families [4.E1] which are listed below along with the proteases which are currently known to belong 
to these families. 
*5 [1714] Family M1 

Bacterial ammopeptidase N (EC 3,4,11 2) (gene pepN). 
Mammalian ammopeptidase N (EC 3.4.11.2), 

Mammalian glutamyl ammopeptidase (EC 3.4.11.71 (ammopeptidase At. it may play a role in regulating growth 
so and differentiation of early B-lmeage cells. 

Yeast ammopeptidase yscli (gene APE2). 

Yeast alantne/arginine ammopeptidase (gene AAP1 ). 

Yeast hypothetical protein YIL137C. 

Lsjiifofnene A-4 fMtolase- iEC 3 ^ 2 ft; This enzyme is responsible for the hvdroiysis of an epuvide inutefy of 
25 LTA-4 to form LTB-4; it has been shown that it binds zinc and is capable of peptidase activity 

[1715] Family M2 

Angiotensin-com/erttng enzyme (EC 3 4 i5 1 > (dipeptidyl carboy peptidase lj i ACE) the enzyme lesoonsible for 
30 hydro!y2inq : mgK U-'n&m I to angiotensin II Ihere are two fonita of Act a festis-speeiik isozyme and a somatic 

isozvme which has two active centers. 

[1716] Family M3 

35 - Thim> j t oligope-ptid ise tEC 2 4 24 1^1 a niaininaiian enzym- 1 invoiced in th> j (.yiuplasmio degradation of small 
peptides. 

Neuroiysm <EC 1 4 24 W (also knuvui as niitochoridn : il c liqopepfid : ise M oi microsomal endopeplidase) 
Mitochondrial interm<-at^te peptidase pr^cursot (CC 3 4 24 59m'MIP) It is invoked the se< ono itage of processing 
of some proteins imported in the mitochondrion. 
40 - Yeast saccharoKsin ^EC 3 A 24 3 - } tt.iotema^ yscD* 

hsoherKhia >'oli and telated bacteria dipeptidyl (.arboxyt eptidase ifc> j4 h 1 )! <gene depi 
Esa;heri:hi<j :o!i am (elated bacteria oligopeftidase A (EC 3 4 24 70) (gene opdA ot prfC 
/east hypothetical protein yM.134c. 

45 [1717] Family M4 

Thermostable thermoiystns (EC 3,4,24.27). and related thermolabile neutral proteases (bacillolysms) (EC 
3.4.^4.281 from various species of Bacillus. 

Pseudolysin tEC 3 4 24 20) from Pseuoomonas aeruginosa (gene lasB) 
so - E^tt Stellular dastase tarn Staphylococcus et.ideimidib 

E>tr .cellular ftote.jse prti ftotn [irwinia taroto^ or.j 

E>tr,j^liular mtmr ptoteus^ smp ftom Setratia rrutcessj^nss 

Vibnolysm (EC 3.4.24.25) from various species of Vibrio. 

Protease prtA from Listeria monocytogenes. 
55 - E^trai-ellulat proteinase ptoA Horn Legion^!! i pneumophila 

[171 8] Family M5 
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Mycolysin {EC 3.4 24.31 ) from Streptornyces cacaot 
[1719] Family M6 

s - immune inhibitor A from Bacillus thunngtensrs (gene ma). Ina degrades two classes of insect antibacterial proteins, 
attacins and cecropirts. 

[1720] Family M7 

to - Streptomyces intracellular small n&utral proteases 

[1721] Family M8 

Leishmanoiysin (EC 3 4.24 36} (surface glycoprotein gp63). a cell surface protease from various species of Leish- 
*s mania. 

[1722] Family M9 

Microbial coliagenase (EC 3 4 24.3} from Clostridium perfringens and Vtbno algmolyticus, 

[1723] Family M1 OA 

Setralysin (EC 3 4 24.40), an extracellular metalloprotease from Serfatia. 
Alkaline metalloproteinase from Pseudomonas aeruginosa (gene: aprA). 
25 - Secreted proteases A, 8. C and G from Erwinia chrysanthemi. 
Yeast hypothetical protein Yil1G8w. 

[17243 Family M108 

30 - Mammalian extracellular matrix mefaiioproteinases (known as matrixins) [5]: MMP-1 (EC 3.4.24.7) (interstitial col- 
iagenase), MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 3.4.24.35) (92 Kd geiatinase), MMP-7 (EC 
3.4.24.23) (matryiisin), MMP-8 (EC 3.4.24,34} (neutrophil coliagenase), MMP-3 (EC 3,4.24.17) fstromeiysirt-1), 
MMP-1 0 (EC 3.4.24.22) (stromelysin-2), and MMP-1 1 (stromelysm-3}. MMP-12 (EC 3.4.24.65} (macrophage met- 
alloeiastase). 

35 - Sea urchin batching enzyme (envelysin) (EC 3.4.24.12). A protease that allows the embryo to digest the protective 
envelope derived from the egg extracellular matrix. 
Soybean metalloendoproteinase 1. 

[1725] Family M11 

Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 

[1726] Family M12A 

45 - -istann (EC 3 4 24 21) a aayfish .=ridopi otuasf- 

Meprin A tEC 3 4 24 1M a mammalian kidney and intestinal brush bordet metalloendopeptidase 
Bone morphogeny protein 1 (BMP-1 1, a protein which induces cartilage and bone formation and which expresses 
inet : i!!oendopeptidasji : ' aetiMtv Tht Drosophiia homoko. of BMP- 1 is the dorsai-veritra! patterning protein tolk>id 
Blastula orotease 10 iBP10i from Paracentrotns lividus and the related protein SpAN from Strongviocentrotus 

so purpuratus. 

C jenoEh^bditii, elecwni. patein toh-2 
Caenorhabditii, elegans hypothetical pi?tem ^42-^10 3 

Choriolysms L and H tEC 3.4.24.67) (also known as embryonic hatching proteins LCE and HCE) from the fish 
Oiv.tias laptdes 'Ihese pmteases participates in the breakdown of the egg envelope, which is derived from the 
55 <=-gg extt icelluiaf matrr> at tht tinw of h itching 

[17273 Family M12B 
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Snake venom iTtetaliODroteinases jo] This subfamily mostly gtoups proteases that act in hemoimage E^amcles 
are adamalysm II (EC 3 4 24 4r ) atroi>bin CJD \,B„ 3 4 24 421 atrolystn E ^Ec 3 4 24 44 1 fibroid {EC o 4 24 72), 
tnmer^lysin I (Eu 3 4 2S S2} =md 11 .EC 3 4 25 5T| 
Mouse cell surface antigen MS2. 

s 

[17283 Family M1 3 

Mammalian neprilysm (EC 3 4 24.1 r> t neutral endopeptidase) {HBP). 

Endoth elm-converting enzyme 1 (EC 3.4.24,71 1 f ECE-1 t. which process the precursor ot endotheim to release the 
to active peptide. 

Kel! blood group glycoprotein, a major antigenic protein of erythrocytes. The kel! protein is very probably a zinc 
endopeptidase. 

Peptidase O fiom L.j-;toi.oi.cui lactis (gene pepd 
»s [1729} Family M27 

C k stndMl neut orceins inducing tetanm toxin t ]>!">} .tnd the " a nous botulmum to ins iGoNT i 7 h<rs<r toxins. rtr<r 
zinc pioteasfcs that block neurotransmitter release by proteolytic cleavage of synaptic pioteins sucn as synacto- 
brevins, syntaxin and SNAP-25 ]7. 8]. 

[1730] Family M3Q 

Staphylococcus hyicus neutral metalloprotease. 

25 [17313 Family M32 

Thermostable carboxypeptictase 1 (EC 3.4.17.19) {carbexypeptitiase Taq). an enzyme from Thermus aquations 
which is most active at high temperature. 

30 [1732] Family M34 

Lethal factor (IF) from Bacillus anthracis, one of the three proteins composing the anthrax toxin. 

[1733J Family M35 

35 

Deuterolysm (EC 3.4.24,39) from Penicillrum citrmum and related proteases from various species of Aspergillus, 

[1734] Famiiv M38 

40 - Extracellular elastmolytic metalloprotemases from Aspergillus. 

[17353 Fr?m the tertiary stiuitttreofthetmolysin tne position •jfthetesnuesoctino as zinc lioamsano those in^oU'^T 
in the catalytic activity are known. Two otthe sine ligands are histidmes which are very close together in the sequence; 
C-tetmin.ji to th? first histidine is a glutamic .tcid residue which a^ts 3* a nudeophile and promotes the attack of a 
45 v,att!i molecule on the oat bony! carbon of the substrata. A signature pattern which includes thu Uu htstidirif- and the 
glutamic acid residues is sufficient to detect this superfamily of proteins. 
[17363 Description of patterns si and or piofile(M 

Consensus pattem[GSTAL!VN]-x(2VH-E-[LIVMFYW]-{DEHRKP>-H-x-[LiVMFYWGSPQ3 [The 
t^lls are zinc Itgands] [E is the acti- e site residue] 
so tsfr-qu^n*. Knc *n to Ldoirg k this ia^ oeted.fd b\ the pattern ALL 
^X'^ept for members cf fimilip* Mb Kit amd M11 
Othei -e-iuence(s) d^tecte-i in SWISS-FPOFSJ- including Neurospor i 
crassa comdiation-specitic protein 13 which could be a 
zinc-protease. 

ss 

[ ijJongeneel C V EunvierJ baitoch A 

FFRt-- t t-ft 242 '\ \ 2l4.1<kOi 

[ 2]Muiphy Ci J P Murphy G Reynolds J J 
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FE8S Lett. 289:4-7(1991). 

[ 3]Bode W., Grams R, Rememer P., Gomis-Rueth F.-X., Baumann U„ McKay 

D.B.. StoeckerW. 

Zoology 99:237-246(1 996V 
s [ 4jRawlings N.D., Barrett A.J. 

Meth. EOnzymol 248 183-228(1995) 

[ SjWoessner J. Jr. 

FASEBJ. 5:21 45-21 54(1991). 

[ 6]Hite L.A., Fox J.W . Bjarnason J B. 
10 [ 7]Montecuceo C, Schtavo G. 

Trends Biochern Sci 18 324-327(1 993 1 

[ 8]Niemann H., Blast J., Jahn R. 

Trends Cell Biol. 4:179-185(1994} 

*5 [1737J 751. PseudoU„syntJ 

tRNA pseudouridine synthase is involved in the formation of pseudouridine at the anticodon stein and loop of transfer- 
RNAs Pseudouridsne is an isomer of undine (5-(beta-D-nbofuranosy!) uracil, and id the most abundant modified nucl- 
eoside found in ail cellular RNAs The TiuA-iike piotems also exhibit a conserved sequence with a strictly conserved 
aspartic acid, likely involved in catalysis. Number of members: 25 

20 [1738] [1]Medlme 98254513 Tfansfet RNA-pseudoundme synthetase Pusl of Saceafomyces cerevisiae contains 
one atom of zinc, essential fonts native confoimafion and tRNA recognition Arluison V, Hountondji C Robert B. Gtos- 
jean H, Biochemistry 1998 37 7268-7276 
[1739] 752 EPSP synthase signatures 

EPSP synthase (3-phosphoshikimaie l-cafboxyvinyltrarisferjset (EC 2 5 1 19t catalyzes the sixth step in the bios^n- 
25 thesis from chonsmate of the aromatic ammo acids (the shiktmate pathway) m bacteria (gene aroA), plants and fungi 

(where it is part of a multifunctional enrryme which catalyses five consecutive steps in this pathway) [1 j EPSP synthase 

has been extensively studied as if is the target of the poterU herbicide giyphosate which inhibits the enzyme. 

[1740] The sequence of EPSP from various biological sources shows that the stfuetureof the enzyme has been well 

conserved throughout evolution Two conserved regions were selected as signature patterns The first pattern corre- 
30 spends to a region that is. part of the adi\e site and which is also importanl for the resistance lo giyphosate [2] The 

second pattern is located in the C-tetminal part of the piotein and contains a conserved lysine which seems to be 

important for the activity of the enzyme. 

[1741] Desenption of pattern; si and/or profile(s) 

[1742] Consensus p,atteinjLlVMHi2)-[GN3"k4SA[-G--T-[ST&3-y-R"X"SLIVMV [-*-[GSTA| 
35 Consensus pattem[KR]-x-[KH3-E-[CSTHDNE]-R-(LIVM}-x-[STA]-[LIVMC]-x(2HEN]-[LlVMF]-x-{KRAHLiVMF3-G 

[ l]Stallings WC , Abdel-Megid S S , Lirn L W, Shush H -S . Oaynnget H E , Leimgruber N K , Slegeman R A 
Anderson h S , Sikoiski J A , Padgette S R , Kishore G M Pioc Natl Acad Sci U S \ 68 5046-5050(1991} 
[ 2]Padgette S R , Re D B . Gaser C S , Eicholtz D A Fraziei R B . Hironaka C M , Levine E B Shah D M . Fraley 
40 R T . kishoie G M J Biol Chem 260 22364-22369(1 99 1) 

[1743] 753 Glyc.Q_hydro.J8 

Glycosvi hydrolases family 18. Number of members 1 173 

j tJMedline. 952193/9 Ciystal stiuctuie of a bacterial chitinase at 2 3 A resolution Perrakis A 'lews I, Dauter Z, Op- 
4S periheim <\B Chet !, Wilson hS Vorgtas CE- Structure 1994'2 1169-1180 
[1744] 754 Esterase 
Putative esterase 

This family contains Esterase D Swiss P10768 However it is not clear if all members ot the family have the same 
function This family is possibly related to the COesterase family 
so Number of members: 36 

[1745] 76?. iHMA) Heavy- metal-associated domain 

A conserved domain ot about 30 amino acid residues has been found [1] in a number of proteins that transport or 
detoxify heavy metals. This domain contains two conserved cysteines that could be involved in the binding of these 
metals The domain has been termed Heavy-Metai-Associated (HMA). It has been found in: 

ss 

Avai tety of cation transport ATPases t'E 1-E2 ATPasesj i see -'PDOC00139^ The human copper ATPAses ATP7A 
and ATP7B which are respectively invoked in Menke's and Wilson's diseases. ATP76 and ATP7B both contain 6 
tandem copies ot the HMA domain The copper ATPases CCC2 fiom budding yeas! copA from Enterococcus 
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faecalis and synA from S> nechococcus contain one copy of the HMA domain The cadmium ATPases cadA ftom 
Bacillus firmus and from plasmid pl254S from Staphylococcus aureus also contain a single HMA domain, while a 
chromosomal Staphylococcus aureus cadA contains two -xptes Otlw less ch : n?K termed ATPases that corrkiin 
the HMA domain are fi<l ftom Rhizobnim mehloti parS from Synechococcus ^tiain PCC 7942), Mycobacterium 
s leprae ctpA ano ctpB and Escherichia colt hypothetical protein yhhO In all tnese -Mfases the HMAdomaints) are 

located in the N-terminal section. 

Metcurtcr-Kiiiei ise ( EC 1 1o 1 1 ngun^int;! Am hich is generally encoded bypiJsmidscarn-Ki hy mercuiy-r> j sistant 
Gram-negative bacteria Mercuric reductase is a class- 1 p>ndine nucleotide-oisulpntde o^idoreductase t$ee 
<-PtXX 000/3-m There ii. yen*- rally one HMA domain (With the <*\<^t? ptton of a chiomosomal merA ftom Bacillus 

to strain RC607 which has two) in the N-tern<nnal part of merA. 

Mercunc ti msport protein periplastic component icpnt; merPt also encoded bv pljsmids cjrned bv inetcury- 
i extant Gram-negative bacteria It se^mt. to be a m^n;ui\ b^^v<=ng<=i that bpeufic=illy binds k on* 1 Hyr2+ ) ion 
.tnd which passes it to the merainc teductase via the merl' c totem The N-teimmal halt of trviP is .j HMA domain 
Helicobacter pylori copper-binding protein copP. 

»5 - Veast protein AT/1 [<:]. which could act in the transport and/or partitioning of copper. 

[1746] 'i he consensus pattern for HMA sp .tm the complete domain 
[1747] Description of patterntsi and 'or piofiie(s) 

Consensus pattemEL!VNhx{2HLiVMFA]-x-C-x-[STAGCDNH}-C~x{3HLiVFG3-x{3)-[LiV]"X{9.11)"EIVA3-x-[LVFYS] [The 
so two C"s probably bind tnstals] 

E ijBull PC Cox D W Trends Genet 10 246~252t W34) 

E2]Lin S.-J.. Cufotta VI. Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

25 [17483 756. (Peptidase M 10! Matnxins cysteine switch 
PROSIT noss-rfferenrefsi OSTSINt: SWI'K H 

Mammalian tr>tractrffui<jr matrix metalloc tolsr-mas.es. (EC 3 4 24-) also known as, malriMns [ lj (see <-PD>. CP012V -) 
ir« i:me-dependunt enzymes They ar> j surr^d by fulls in in inactive fotm (jr«,inogen> th3t differ from tht m iture 
<snz\ me b) the presence of an N-teiminai propeptide A highly consewed octapeptide is found uvo residues downstream 
30 ot the O-terminai end otthe propeptide. 1 his region has been shown to be involved in autoinhibition of matnxins [2.3 j: 
a : i ( steme within the oct.iper.tide cnelates th^ a:ti^ site zinc ion thus inhibiting the enzvrne This region has been 
called the 'cysteine switch or 'automhibitor region'. 
A system* 1 so'rt_ii hab b^en found in th<= folk o'inn ^in^. proteases 

35 - MMP-1 (EC 3.4.24.7} {interstitial collagenase). 

- MMP-2 (EC i 4 24 24) t /2 Kd gelatmasei 

- MMP-3 (EC 3,4,24. 1 7) (stromeiysin-1 ). 

- MMP-? (EC 3 4 24 23) (mattiiysin) 

MMP-8 (EC 3.4.24,34) (neutrophil collagenase). 
40 . MMP-9 (EC 3.4.24,35) (92 Kd gelattnase). 

- MMP-1 0 .bC 14 24 22| fstioinel>sin-2i 

- MMP-1 1 (EC 3.4.24.-! (stromeiysm-31 

- MMP- 12 ([-a 3 4 24 65) (macrophage metalloelastasei 

- MMP-1 5 1 EC 3 4;4-)i>.o!l.to,eti3ie^} 

45 - MMP-14 (EC 3 4 24 -t i n^mbrjne-typs; m itny metalliproteinase 1 \ 
MMP-15 (EC 3 4 24 -} (membiane-type matrix metal lipioteinase 2> 
MMP-1 6 (EC 3.4.24,-) (membrane-type matrix metal ((proteinase 3). 
Sea urchin hatching enzvrne (EC 3.4.24.12) (envelvsin) [4], 
C hlamydomonas reinharotn gamete lytic ercry me (GLE i [3] 

[1749] Description of pattern! s) and/or profile! si 

Consensus pa«ernP-R-C-[GN]-<-F-[DRHUVSAPkC:] EC ihHate^ the ^int ion] 

[IjWoessnerJ Jj FASfcfrJ t 214e-21o4(1&4i ) 
ss [2]SarKhe--LopezR Nicholson R GesndMC M itrisi in L M Brejthnaeh R J Biol Ch-»m 2o3 11^92-11^39 

(1988). 

E 3]Park A J , Matrisian L.M., Kells A.F., Pearson R„ Yuan Z„ Navre M. J. Biol. Chem. 266:1584-1590(1991). 
E4]L<^p=igeT Ga^heC EM BO J 9 o003-3012( 1^00) 
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ES3K!noshitaT.FukuzawaH.,Sh!madaT., SartoT., MatsudaY Proc Natl. Acad Sci USA 89 4693-4697 t t992i 
[1750] 757. /Peptidase S8) Serine- proteases, subtiiase family, active sites 

PROSITEcross-reference(s): PS00136; SUBT!LASE„A3P, PS00137 ; SUBT!LASE_HlS. PS001 38 3UBTIIAS£_S£R 
s Subtilases [ 1 .2] are an extensive family of serine proteases whose- catalytic activity is provided by a charge re-lay system 
similar to that of the trypsin family of serine proteases but which evolved by independent convergent evolution The 
sequence around the residues involved in the catalytic triad (aspartic acid, serine and histidme) are completely different 
from that of the analogous residues in the trypsin serine proteases and can be used as signatures specific to that 
category of proteases. 
to The subtiiase family currently includes the following proteases: 

Subtilisins (EC 3.4.21 .62}. these alkaline proteases from various Bacillus species have been the target of numerous 
studies in the past thirty years. 
Alkaline elastase Ya8 from Bacillus sp. (gene ale). 
*s - Alkaline serine exoprotease A from Vibrio aiginoiyticus (gene proA). 
Aquaiysin I from Thermus aquatieus (gene psti). 
AspA from Aeromonas saimonicida. 

Baciilopeptidase F (esterase) from Bacillus subtiiis (gene bpfi 

C5A peptidase from Streptococcus pyogenes (gene scpA) 
so - Cell envelope-located proteases Pi, PI I, and Pill from Laetococeus lactis. 

Extracellular serine protease from Serratia rnarcescens. 

Extracellular protease from Xanthomonas campestris 

Intracellular serine protease (ISP) from various Bacillus. 

Minor extracellular serine protease epr from Bacillus subtiiis (gene epr). 
25 - Minor extracellular serine protease vpr from Bacillus subtiiis (gene vpr). 

Nisin leader peptide processing protease msP from Laetococeus iactis. 

Serotype-specific antigene 1 from Pasteurella haeinoiytiea (gene ssal). 

Thermitase (EC 3 4 21 .66} from Thermoaclinoinyces vulgaris. 

Calcium-dependent protease from Anabaena variabilis (gene prcA) 
30 . Halolysin from halophiltc bacteria sp. 172p1 (gene hly). 

Alkaline extracellular protease (AEP} from Yarrowia iipolytica (gene xpr2). 

Alkaline proteinase from Cephalosporium acremonium (gene a! pi 

Cerevisin (EC 3A21.48) (vacuolar protease B} from yeast (gene PRB1) 

Cuticle-degrading protease (pr!) from Metarhizium anisopliae. 
35 - KEX-1 protease from Kluyveromyces lactis 
- Kexin (EC 3.4.21 .61 ) from yeast (gene KE.X-2) 

Oryzin (EC 3 4.21 63} (alkaline proteinase) from Aspergillus (gene alp}. 

Proteinase K {EC 3 4 21 .64} from Trttirachium album (gene proK) 

Proteinase R from Tritirachium album (gene proR) 
40 - Proteinase T from Tritirachium album (gene proT) 

Subfiiisin-like protease HI from yeast (gene YSP3). 

Thermomycolin (EC 3,4,21.65) from Malbranchea sulfurea. 

Rtrin (EC 3 4 21 .85i. neuroendocrine convertases 1 to 3 f NE--.G--1 to -3) and PACE 4 protease from mammals, other 
vertebrates, and invertebrates These proteases are involved in the processing of hormone precursors at sites 
45 comprised of pans of basic amino acid residues [3] 

Tripe ptidyl-peptidase II (EC 3.4. 14.10} (tripeptidy! aminopeptidase) from Human. 

Prestalk-specific proteins tagB and fagC from slime mold [4]. Both proteins consist of two domains: a N-terminal 
subtiiase catalytic domain and a C-termmai ABC transporter domain (see <PDOC00185>). 

so [1751] Description of pattern(s) and/or profile(s} 

Consensus pattem[STAiV>x-[LIVMFHLlVM]--D--[DSTA|-G--[LIVMFC]-M2 : 3;H[DNH] [D is the active site residue] 
Consensus patternH-G-[STM]-x-[VIC]-(STAGCHGS]-x-(LIVMA]-[STAGCLV]-fSAGM] [H is the active site residue] 
Consensus patternG-T-S-x-{SA)-x--P-x(2HSTAVC}-{AG] [S is the active site residue] 

Note if a protein includes at least two of the three active site signatures, the probability of it being a serine protease 
ss from the subtiiase family is 1 00% 

Note these proteins belong to family S8 in the classification of peptidases [5. El] 

[ 1]Siezen R.J., de Vos W.M., Leunissen J.A.M., Dijkstra B.W. Protein Eng. 4:719-737(1991}. 
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[ ^jStezen R J \,in> Froceeaing subttiism symposium Hamourg (1t92> 
[33B3)tRJ. Cell 68:1-3(1991). 

[ 4]Shaulskv G Kusps A Loomis W F Genes Dev 0 111 1-1 t22( 1995) 
[ SjRawlings N D ButrettAJ Meth En;vtn7l 244 10-61 < 1994) 

s 

[1752] ' 68 tS~?B) Smoje strand tmdino, protein f.jmiiy ti-jn .ttuies 
PROSITE cross-refer* nc->. si PSGOnS SSE_1 PbOO"X bSB_2 

The Escherichia coif single-strand binding protein [1] (gene ssb). also known as the helrx-destabilizing protein. is a 
protein ot ! ''" amino acids it binds tightlv as a homotetianvi tosingle-stianded DNAtSs-DNAi and plavsan important 

10 rale tn DNA replication, recombination and repair. 

[1753] Glossy related variants of SSB ire uncuctud in the gsmome of a vat if tv of large stjif-lransinissihl- 1 pi ismids 
SSB hat, a!s<. l*en v.hatact^rced in battens btxh 3b Pioteus mtrabilis oi benatia marce^ns 
[1754] Ciukaiyttk mitochondiMl proteins tfi.jt bind ps -DNA anaatepiobatiy in^ked inmrkohondnal DNA le plication 
are structurally and evolutionary related to prokaryotic SSB. Proteins currently known to belong to this subfamily are 

is listed below [2], 

- Mammalian protein Mt-SSB {P18}. 

- Xenopus Mt-SSBs and Mt-SSBr. 

- Drosophila MtSSB. 
so - Yeast protein RIM 1. 

[1755] Two signature patterns have been developed for these proteins. The first is a conserved region in the N~ 
terminal section of the SSB's The second is a centrally located region which, in Escherichia coll SSB, is known to be 
involved in the binding of DNA. 
25 1 1756j Description of pattern; s) and/or profiie(s) 

Consensus pattem[LIVMF3-(NST3-[KRT]"[LtVM3-x-[LIVMFj(2)-G-(NHRK}-[LIVM3-(GST3-x-(DET3 
Consensus pa{ternT-x-W-[HY3-[RNS3-[LiVM)-x-{L!VMFHFY]-[NGKR] 

[ 13Meyer R.R., Laine P.S. Microbiol. Rev. 54:342-380(1990). 
30 [ 23StroLimbakis MX). , Li Z. , Tolias P. P. Gene 1 43:1 71 -1 77(1 994). 

[17573 7 59. KDRG and KHG aldolases active site signatures 

PROSITE cross-references): PS00159; ALOOLASE_KOPG_KHG_1, PS00160; ALOOLASE_KOPG_KHG_2 
[1758] 4-hydro:<y-2-o:<oglutarate aldolase {EC 4. 1.3. 16) (KHG-aldolase) catalyzes the interconversion of 4-hydroxy- 
35 2-oxogiutarate into pyruvate and glyoxyiate Phospho-2-dehydro-3-deoxygluconate aldolase (EC 4.1.2.14) (KDPG- 
aldolasei catalyzes the interconversion ot 6-phosphO"2"dehydro-3"deoxyT3-gluconate into pyruvate and glyceralde- 
hyde 3-phosphate. 

[1759] These two enzymes are structurally and functionally related [1 j. They are both homotnmeric proteins of ap- 
proximately 220 arnino-acid residues They are class I aldolases whose catalytic mechanism involves the formation 
40 of a Schiff-base intermediate between the substrate and the epsilon-amino group of a lysine residue In both enzymes, 
an argimne is required for catalytic activity. 

[1760] Two signature patterns were developed for these enzymes The first one contains the active site arginine and 

the second, the lysine involved in the Schiff-base formation. 

[1781] Description ot pattern! s) and/or profile! s) 
4S Consensus pattemG-[LIVM}-x(3}-E-[L!V3-T-[LF]-R [R is the active site residue] Consensus pattemG-x{3)-{UVMF)-K- 

(LF]-F-P-(SA]-x(3}-G [K is involved in Schiff-base formation] 

[1762] ( 1J Vlahos C d , Dekker EE d Biol. Chem. 263. 11683- 11 691(1988). 

[1763] 760 AP endonucleases family 1 signatures PROSITE cross-reference(sy PS00726: 

AP_NUCLEASE_F1_1 . PS00727; AP_.NUCLEASE_.F1_2, PS00728; 
30 AP_NUCLEASE_F1_3 

[1764] DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 
oxygen radicals produce a variety ot lesions in DNA Amongst these is base-loss which forms apunnic/apyrimidinic 
(AP) sites or strand breaks with atypical 3 'termini. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blocking 
55 groups from the 3'terminus of DNA strand breaks. 

[1765] AP endonucleases can be classified into two families on the basis of sequence similarity. Family 1 groups 
the enzymes listed below [13. 
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Escherichia colt exonuclease 111 (EC 3 i 11.2} (gene xthAl 
Streptococcus pneumoniae and Bacillus subtilis eyonuciease A (gene eyoAi. 
- Mammalian AP endonuclease 1 (API ) (EC 4 2 99 18) 
Drosophiia recombination repair protein 1 (gene Rrp1). 
Arabidopsis thaliana apurimc endonuciease-redox protein (gene arp) 

[1766] Except for Rtp1 and arp, these enzymes are proteins of about 300 ammo-acid residues. Rrpl and arp both 
contain additional and unrelated sequences in their N -terminal section (about 400 residues for Rrp1 and 270 for arp). 
[1767] Three signature patterns were developed for this family of enrrymes The patterns are based on the most 
conserved regions The first pattern corvlams a glutamate which has been shown [2], in the Escherichia colt enzyme 
to bind a divalent rnelai ion such as magnesium or manganese 

[1768] Consensus pattern[APF]-D-[L!VMF](2)-x-[L!VM]-Q-E-x-K [E binds a divalent metal ion] 
Consensus pattei nD4ST]-[F V ]-R-[ KH]-x( 7 , 8 )-[F f'Wj4ST]-[F V Wj(2> 
Consensus pattemN-x-G-x~R~[LIVM]-D-[LIVMFYH]-x-[LV>x-S 

[ 1] Barziiay G., Hickson l.S. BioEssays 17:713-719(1995). 

[2] Mot CD., KuoC.-F. Thayer MM.. Cunningham R.P., Tainer J. A. Nature 374:381-388(1995). 

[1769] 761 (ERjEnhancer of rudimentary signature, FROSiTF. cross-referenceis). PS01290: ER 

[1770] The Drosophiia ptotem 'enhance? of rudimenlary' igene ie(0) is a small protein ot 104 residues whose function 

is not yet clear From an evolutionary point of ^'ie< it is highly conserved [1] and has been found to e<ist in probably 

ail multicellular eukaryotic organisms, it has been proposed that this protein plays a role in the celi cycle. 

[1771] A conserved region in the central part of the protein was selected as as signaute pattern 

[1772] Consensus pattern V-D-l-[SA3-x-L-[FYj-x-F-[lVj-D-^(35-D-[LIV]-S 

[17733 f 1) Geisthorpe M„ Pulumati M, McCallum C, Dang-Vu K„ Tsubota ST. Gene 188:189-195(1997). 

[1774] /'6:? (F.TF alpha) electron transfer flavoprotein alpha-subumt signature. PROSIT!--, cross-referencefs): 

PS00696: ETF.. ALPHA 

[1775] The electron transfer flaxoprotem i ETF) [1,2] ser\es as a specific electron acceptor for various mitochondrial 
dehydiogenases ETF transfers electrons to the mam resptiatory chain via ETF-ubiqumone oxidoreductase ETF is an 
heterodirnef that eonsisi oi an alpha and a beta subunit and which bind one molecule of FAD per dirnet A similar 
system also exists in some bacteria. 

[1776| The alpha su burnt of ETF is a protein of about 32 Kd which is structurally related to the bacteria! nitrogen 
fixation protein fjyB which could play a role in a tedo* process and feed elections to ferredoxtn 
[1777] Other related proteins are: 

Escherichia coli hypothetical protein ydiR. 
Escherichia coli hypothetical protein ygcQ. 

[1778] A highly conserved legion which is located in the C-terminal section was selected as a signature pattern foi 
these proteins. 

[1779] Consensus pafiern ELI]-Y-[LIVMj-(ATj-<-G-E!Vj-[SDi-G-x-(IVi-Q-H-i<j2)-G-i<j6V[!yi->c-A-[IV]-N 

E 1] PinoochtaroG ikeda Y. ito M Tanaka K. Prog Clin Btol Res 321 63; - -65::{1990i 
|::]TsaiMH. SaierMH Jr Res Microbiol 146.39M04. 1995) 

[1780] ?63 pectin cj C-type lectin domain signature and profile 

PhOSfTE cross- reference^ ): PS'30615. C . T YPEJ.E CT1N..1 . PS5004 1: C . T /PF...U- CTiN. 2 
[1781] A number ot different families of proteins share a conserved domain which Vvas first characterized in some 
animal lectins and which seem to function as a calcium-dependent carbohydrate-recognition domain [1.2.3] This do- 
main, which is known as the C-type lectin domain (CTLi or as the carbohydrate-recognition domain (CRD), consists 
of about 110 to 130 residues There aie four cysteines which are perfectly conserved and invoked in two disulfide 
bonds A schematic representation of the CTL domain is shown below 
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xcxxxxcxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxWxCx 
xxxCx 

+. — + - — ... + 

'€': conserved cysteine involved in a disulfide bond, 
'c'; optional cysteine involved in a disulfide bond. 
'*': position of the pattern. 



[1782] The categories of proteins, in which the OIL domain has been found, are listed below 

[1783] Type-li membrane proteins where (he CTL domain is located a! the C-terminal extremity of the proteins 

Asialoglycoprotein receptors (ASGPR'j (also known as hepatic lectins') [4] The ASGPR's mediate the endocytosis 
of plasma glycoproteins to which the terminal sialic acid residue in then carbohydrate moieties, has been removed 
Low affinity immunoglobulin epsilon Fc receptor (lymphocyte IgE receptor), which plays an essential role in the 
regulation of IgE production and in the differentiation of B cells. 

Kupffer cell receptor A teceptor with an affinity fot galactose and fucose, that could be involved in endocytosis 
A number of proteins expressed on the surface of natural killer T -eel Is: NKR-P1. Yt-M/88 (L.y-49). CD69 

and on B-cells: CD72, LyB-2. The CTL- domain in these proteins is distantly related to other CTL-dornains; it is 
unclear whether they are likely to bind carbohydrates. 

[1784] Proteins that consist of 3n M-terminal collagenous domain followed by 3 CTL- domain [5], these proteins are 
sometimes called 'collectins': 

Pulmonary surfaciant-assoetated protein A(SP-A) SP-A is a calcium-dependent protein that binds to surfactant 
phospholipids and contributes to lower the surface tension at the air-liquid interface in the alveoli of the mammalian 
lung. 

Pulmonary surfactant- associated protein D i'SP-O), 

Conglutinin, a calcium-dependent lecttn-like protein which binds to a yeast cell wall extract and to immune com- 
plexes through the complement component (iC3b) 
Mannan-bindtng proteins (MSP) (also known as man nose-bind ing proteins). 
MBP's bind mannose and M-acetyl-D-g!ucosamine in a calcium-dependent 
manner. 

- Bovine collectln-43 (CL-43). 

[1785] Selectins (or LEC-CAM) [6.7]. Selecting are ceil adhesion molecules implicated in the interaction of leukocytes 
with platelets or vascular endothelium. Structurally, selectins consist of a long extracellular domain, followed by a 
transmembrane region and a short cytoplasmic domain The extracellular domain is itself composed of a CTL-domatn, 
followed by an EGF-ltke domain and a variable number of SCR/Sushi repeats. Known selectins are: 

Lymph node homing receptor (also known as L-seiectin, leukocyte adhesion 
molec.ule-1, (LAM-1 ). leu-8, gp90-mel, or LECAM-1} 



272 



EP 1 033 405 A2 



Endothelial leukocyte adhesion molecule i (ELAM-1 E-selectin or LECMI-Oj 

[1786] Tht; ligand recoyruntid by ELAM-1 is siaiyl-Le wis > 

s - Granule membrane protein 140 (GMP-140, P-selectin. PADGEM. CD62, or LECAM- 
3V The ligand recognized by GMP-140 is Lewis x. 

[1787] Large proteoglycans that contain a CTL-domain followed by one copy of a SCPJ Sushi repeat in their C- 
terminai section; 

10 

■iggrecan icarii! ige-spectfic prot> j ogiyc in coie- protein t This prot> j ogUc in it a major component of th> j tMiaee-liular 
matir> ot ^rttlagpnous tissues "\heif 1 it has a rol*> in th*> itsibtaiKe to u inprfssion 
- Brevican. 
Neurocan. 

*s - Versican 'large fibroblast proteoglycan; a large chonaroitm sulfate proteoglycan that may play a role in intercellular 
signalling. 

[17S8] in addition to the CTL and Sushi domains these ptotetns also contain in then [^-terminal domain an Ig-like 
\ -hpe region two or foui hn!- domains tsee -'PDOCOU^Sm and up to two U. F-ltl-e lepeati, 
so [1789] Two type-! membrane proteins: 

Manno^e teceptor ttom macrophages This protein mediates the endocytosis of 

glyxprotfin*. b\ inaciophayeb in ^eveial rf cognition and uptake pioi^sfs Its ediac^llulai section _x nsists <.f a 
fibrunuf tin typt; II dom iin follow t;d hy e-ight tandem mp- 1 its of tht- CTL domain 
2S - 180 Kd secretory phosphoitpase A2 receptor (PLA»£-R). A protein whose 
structure is highly similar to that otthe mannose receptor, 

DfF'. -205 receptor This prde-in is uw>i by de-ndrtlic cells, and lh>tiK epithelial tells to ca(.lut«- and endocytose 
divtsii.1* carbohydrate-binding antigens and direa Ihem to in ngen-pnx eating ct-llulai compartments DEC -205 
e^tracellulai section consists of a fibionectm type 11 domain followed b\ ten tandem repeats of the CTL domain 
30 . silk moth hemocytm. an humoral lectin which ts involved in a self-defence mechanism. It is composed of 2 f-Ab8C 
domains (see <PDOC00988>). a CTL 

domain. 2 VWFC domains (see <PDOCQ0928t. and a CTCK (see <PDOCG0912>). 
[1790] Vjnoue other piotems that unique!*, >.otisist <. f a C'l 1. domain 

35 

Invertebrate soluble gaiactose-bindmg lectins. A category to which belong a humora! lectin from a flesh fly: echi- 
noidrn a le-din from the coe'lomic fluid of a sea uichin BRA-2 and BR^-3 two !«.->' tins, from the ■x-ilotnic fluid ofa 
barnacle, a lectin from the tunicate Polvandroearpa misakiensis and a newt oviduct lectin. The physiological im- 
portance of these lectins ib not yet l.nov\n but the> may play an important iole m defense mechanisms 
40 - Pancreatic stone protein (PbP) (also known as pancreatic thread protein (PTPi, or reg). a protein that might act 
as an inhibitor of spontaneous calcium carbonate precipitation. 

Pancteatitis associated ptotem (FAP) <j protein tnat might fce in* oived in the control of oactenal proliferation 
Tetranectin, a plasma protein that binds to plasminogen and to isolated krmgle 4. 
Evosmofhil grjnulfr irwjor b^sk protein tMBPi, a cytotoxic piotein 
45 - j galactose sp.>ciik lectin ftom a rattle-snak-* 

Two subumts of a coagulation factor IX/factor X-bmdmg protein (JX/X-bpY a snake venom anticoagulant protein 
which binds with factors IX and X in the presence of calcium. 

Two sut units, of a phospholipast- A2 inhibitor fiom tht plasma of a snake (PLI-A and PLI-Bt 
A lioopohj sacchande-Dinding protein t LPS-BP) from tne hemoiymph of a 
so cockroach [8j. 

Sea raven antifreeze protein (AFP) (9j. 

[1791] As a signature pattern for this domain tne C-terminal region ^> tth its three conserved cysteines was selected 
[1792] Consensus pattemC-fJ.IVMF fATG] v, c i 1?)-jWI. J- <-EDNbPj-x t 2VC^('5^i (F-VWLlVb"IAKl.lVMS"l Aj-C ['! he 
ss three C's are involved in disulfide bonds] 

Note all CTL domains ha^e five Tip lesidues oeforetne second Cys y, tth the exception of tunicate lectin and cockroach 
LPS-BP which have Leu. 

Nott; this documentation entiy is link-id to both a signature pattern arid a profile As the profile ts much mote sensitive 
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than tne pattern yon should use it if \ou <na\e access to tne necessary software tools to do so 

[ 1] DtiekamiW K I Bio! Ch^m 263 9557-9560! 1988) 
[23 Otickamerk Prog Nu.ien. A ;id Res Mol Bioi 45 20~-232(1993) 
s [ 33 Drickamcr K Curr Opin Struct Bio! 3 39o-400(1993> 

[4jSpiessM eioihunistry 29 I 0009- 1tV1 8(1 990 1 

[ 5] Weis v\'l kahn R Foiinne R Duck imer K Hendnokson VV A Science 254 1C08-1615M991 . 
[6] Siegelman M. Curr. Bioi. 1:125-128(1991), 
[ 'j La-sky I. A Science 238 9C 4 -96^1 9^2 1 
« [3]JomoriT Naton R I Bioi Ch^m z66 1331 8- 1 3323(1 9<-j1 i 

[ 9] Ng N.F.L. Hew C.-L. J. Bioi. Chem. 267:16069-16075(19921 

[1793] ' 64 (SRCR > Speract r^vptor repeated domain signature 

PROSiTE ct oss- reference) s) PS00420 SPER-KT^RECEPTOR 
*5 [1794] Tne receptor for tne sea urchin egg peptide speract is a transmembrane glycoprotein of 300 amino acid 

residues [1]. Structurally it consists of a large extracellular domain of 450 residues, followed by a transmembrane 

legion and .t small cytoplasmic durum of 12 .tmino acids 'I he extiaiellulrtr dom.jin contains foui tepeats of .t l!6 

amino acids domain Thete are 5~ positions that are oetfectly conserved in the four repeats among them ate sis 

cysteines, six glycines, and three glutamates 
so [1795] Such a domain is aKo found oru'e in the C-fermtnal section of mammalian macrophage scavenger iecet_tu 

type 1 [2] a membr,.inegly:oprotems implicated in the pathologic deposition <'f :hoiesterol in arterial* ails during athero- 

genesis. 

[1796] The bignatute patten that v,as dftiv^d s.pans partofthf N-twminai section of the domain and xntams 8 of 
the 17 conserved residues. 
25 [17973 Consensus patternG-x{5")-G-x{2")-E-x(6W'V-G-x(2)-C-x(3HPVW]-x{8")-C-x(3i-G 

[ 1] Dangoft J J Jordan IF Belief R a Garbed D I. Psco Natl Acad Sot US« 86 212tf-2 !32{ 19891 

[2] Freeman M Ashl-enas J Re>>s D J hinc^ley D M Copeland N G Jenkins N A Kneger M Ptoc Natl 

Acad Set USA 87 83 5 0-6814(5 9301 

[1798] 785. Bac_surface_Ag 
Bacteria! surface antigen 

This entry includes the following surface antigens; 015 antigen from H. influenzae, OMA87 from P.multocida, OMPS5 
from N. meningitidis and N. gonorrhoeae. Number of members: 14 

35 

[IJMedline: 95255676 The sequencing of the 89-kDa D15 protective surface antigen of Haemophilus influenzae. 
Flack FS. Loosmore R. Chong P, Thomas WR; Gene 1995,158 97-99 

[2] Medline: 96333354 Cloning, sequencing, expression, and protective capacity of the otnaS? gene encoding the 
Pasteurelia multocida 87-kiiodalton outer membrane antigen. Ruffolo CG. Adler B. infect Immun 1996;64. 
40 3161-3167. 

[1799] 766 BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in proteins involved in cell cycle checkpoint functions responsive to DNA 
damage. It has been suggested that the Retinoblastoma protein contains a divergent BRCT domain, this has not been 
45 included in this family. The BRCT domain of XRCC1 forms a homodirner in the crystal structure Med!ine:99Q16060, 
This suggests that pairs of BRCT domains 
associate as homo- or heterodtmers. Number of members: 131 

[1] Medline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV, Aitschul SF Bork P; Mature- 
so Genet 1996:13:266-268. 

(2j Medline: 97153217. From BRCAl to RAP1: A widespread BRCT module closely associated with DNA repair 
Callebaut I, Mornon JP; Febs lett 1997:400:25-30. 

[3] Medline: 97186552. A super-family of conserved domains in DNA damage responsive cell cycle checkpoint 
proteins Bork R Hofmann K. Bucher P. Neuwald AF. Aitschul SP. Koonin EV: Faseb J 1997. 1 1 68 -76. 
ss [4] Medline 97402527. Gapped BLAST and PS1-BLAST: a new generation of protein database search programs. 

Aitschul SF. Madden TL. Schaffer A A, 2hang J. Zhang Z. Miller W. Lipman DJ; Nucleic Acids Res 1997.25' 
3389-3402. 

[53 Medline 99016060. Structure of an XRCC1 BRCT domain a new protein-protein interaction module Zhang 
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y MoieraS Bates PA Vvhitehead PC CorferA! Hainbuchet K Nash RA Sternbetg M J LindahIT FteemontPS 
[1800] 767. Kappa casein 

Kappo-iasem is a mammalum mill- piotein involved tn <j number of important physiHognai piocess^s !n th^ gut the 
s ingested protein is split into an insoluble peptide tpara kappa-cascmt and a soluble hydrophilic glycopcotide- (casei- 
nomacrapeptide I. 

Cjs^inomauupuphd- 1 is r^ponsibfi* for inuf-js^d > j fficienoy of digestion prevention of neorut- 1 hypersensitivity to 
ingested proteins, and inhibition of gastric pathogens. Number ot members 56 

[1 SO 1 } jt) Medline 98o y .?E0fl Nucleotide sequence evolution at the kappa-casein locus <?v tdence for posttt^ se 
10 lection within the family Bovidae. Ward TJ. Honeycutt RL Dsrr JM: Genetics 1997:147:1863-1872. 
[1802] -6ft ChtttnasiiS family 18 active sit-> 
PROSITE cross-reference? si CHITIiMASEJS 

Chrtinases (EC 3.2.1 .14) [1 ] are enzymes that catalyze the hydrolysis of the beta-1 .4-N-aceivi-D-qlucosamine linkages 
in chitin polymers. From the view point of sequence similarity chitjnases belong to either family 18 or 19 in the classt- 
*5 fication ofglycosylh>drolases[2 E1] Chitinases of family 12 <atso known as classes Hi or \ ) groups a variety of proteins 

a) Chitmases fronr 

Piokaryotes such as Alteromonas, Bacillus, benatta Str*- ptomy>s, etc 
so - Plants such as Arabidopsts. cucumber, bean, tobacco, etc. 

Fungi such as ^phanocladmm Rhizopus, 8ocmaromy:es et: 
Nematode (Brugia malayn. 
Insects (Manduca sexta) 

Baculo anises i Autographa California Eviuolejr Polyhidrosis ^irus.) 

b) Other proteins: 

Huvainin- 1 a rubhuriree proton with rhitinast; and lysozune ai-tuttie^ 
hinyvei ounces lactis killer to^tn alpha subumt v.htch acts as a cnitinase 
30 - Flavor. acUnum find Sheptom^es ^ndo-befa-rj-ace^lglueoisamtnida^i, (EC 3 2 1 96) 

Mammalian di-t\f-acety!;h]tofciase whim is invoke in the legraoation <'f asparaoine-ltnked glyioc totems 
Human cartilage glycoprotein Gp-39 

Jack bean concanavalin B iconB). a protein that has lost its catalytic activity. 

35 [1803] Site directed mui (genets experiments [3] and crvstalluotaphir data [4 5] have shw n that a conserved yluta- 
mate is involved in the oatalytic meohamsm and probably acts as a pioton donor This giutamate is at the e^tremih, of 
the best conserved region m these proteins. 

[1804] Consensus patt^rn[L!VMrV)-[DN3-G-[LI\ MF]-[F:NHL!\'MrHDN]-y-E: [E is the active site residue) 

40 [ 1 j Flach I PiHP-E Joll^ P E.pen^ntia 48 _ 0!-7 16^9<^2i 

[ 2] Hennss : it B 6 toe hem J JftO 309- i 1 6( 1 ) 

[ ?] Wutanabe T , Konon K Mivashita K FujiiT Suku H Uchida M Tanaha H J Biol Chem 26^ !8567-1S?" 7 2 
{1993k 

[ 4] FViMkis *\ Tw> I D.aiter;f Opp enheim *\ B , Chet I Wilson K S Vugus C (■ Stmctute 2 116<~ t1«0 
4S (1994). 

[5] van Scheitinga A.C.T., Kaik K.H.. Beinterna J.J., Dykstra B.W. Structure 2:1181-1189(1994). 
[1805] 769. gaq_p~17. qag gene protein p17 (matrix protein). 

The matrix protein forms an icosahedra! shell associated with the inner membrane of the mature immunodeficiency 
so virus. Number of members: 1598 

[1806] jt) Medline 9bQb$7t>'' I'hie^-dimensional stnictuie otthe human immunodeficieni.v virus t\pe t rruti^ pto- 

inn Masswh MA Stanch MR Paschal! C Snmmets MF Christensen *U SundquisUVI l Mo! Biol 1994 244 19^-22? 

[1807] ^70 GDA1/CD30 fami!\ of nucleoside phosphatases signature 

PROSIT cioss-rt-ferencefsi, GDA1__CD39..NTPA^£-: 
ss A. number of nucleoside diphosph lit and triphosphate hydrolases as vi ull is some yet iirnJiaiartenzud proteins have 

been found to belong to the same fa mil) [1 2) Tnis family curtently consist ot 

Yeast guanoslne-diphosphatase (EC 3.6.1.42) (GDPase) (gene GDA1). GDA1 is a golgi integral membrane sn- 
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zyme that catalyzes the hydrolysis of GDP to GMP 

Potato apyrase (EC 3.8. 1 . 5} {adenosine diphosphatase) (ADPase). Apyrase acts on both ATP and ADP to produce 
AMP. 

Mammalian vascular ATP-diphosphohydrolase (EC 3.6. 1.5} fatso known as lymphoid ce!! activation antigen CD39), 
s - Toxoplasma gondii nucleoside-triphosphatases (EC 3.6.1 15} (NTPase) NTPase hydroiyses various nucleoside 
triphosphates to produce the corresponding nucleoside mono- and diphosphates This enzyme is secreted into 
the: invaded host cell into the patasitophorous vacuole, a specialized compartment where the parasite intracelkrlaiy 
resides. 

Pea nucleoside-triphosphatases (EvC 3.6.1 .15) (NTPase) 

10 

Caenorhabditis elegans hypothetical protein C33H5. 14. 
Caenorhabditis elegans hypothetical protein R07E<1.<1. 
Yeast chromosome V hypothetical protein YE:R005w. 

*5 [1609] The above uncharacterized proteins ali seem to be membrane-bound 

[1809] All these proteins share a number of conserved domains. The best conserved of these domains have been 
selected. It is located in the centra! section of the proteins. 

[1810] Consensus patternj;LlVM]-x-G-v(2)-E-G-x-[FY3-y-[FV^-[LlVAj-[TAG]-x-N-j;HY3 

20 [ 1] Handa M . Guidotti G. Biochem Biophys Res Cornrnun 218-916-923(1996) 

[ 2] Vasconcelos E G . Ferreira S T.. de Carvalho TM.U , de Sauza W.. Kettlun A M„ Mancilla M , Valenzuela M. 
A., Verjovski-Aimeida S. J. Biol. Chem. 271:22139-22145(1996). 

[1811] 771 GTP cyclohydroiase ! signatures 
25 PROSITE cross-reference(s): GTP„CYCLOHYDROL„1„1, GTP„CYCLOHYDROL„1„2 GTP cyclohydroiase ! (EC 
3.5.4.16) catalyzes the biosynthesis of formic acid and dihydroneopterin triphosphate from GTP. This reaction is the 
first slep in the biosynthesis of tetrahydrofoiate in piokaryotes. of tetrahydrobioptenn in vertebrates, and of ptendine- 
containing pigments in insects. 

[1812] GTP cyclohydroiase I is a protein of from 190 to 250 ammo acid residues. The comparison of the sequence 
30 of the enzyme irorn bacterial and eukaryottc sources shows that the structure of this enzyme has been extremely well 
conserved throughout evolution [1]. 

(18133 Two conserved regions were selected as signature patterns. The first contains a perfectly consented tetrapep- 
tide which is part of the GTP-binding pocket [2], the second region also contains conserved residues involved m GTP- 
binding. 

35 [1814] Consensus pattem[OEN]-[LlVM]f2hx{2)-[KRNQHD£NHLiVM3-xf3HST]-x-C-E- H-H 
Consensus pattern[SA|-x-[RK;j-x-Q-[LIVM3"Q-E"(RN]-[LI}-[TSN] 

[ 13 Maier J. t Witter K,, Guetlich M., Ziegler I., Werner X, Ntnnemann H. Biochem. Biophys. Res. Comrnun. 212: 
705-711(1995). 

40 [2] Nar H., Huber R., Meining W, Schmid C, Weinkauf S., Bacher A. Structure 3:459-466(1995). 

[1815] 772. HvC. Acetohydroxy acid isomeroreductase 

Acetohydroyy acid isomeroreductase catalyses the conversion of acetohydroxy acids into dihydroxy valerates This 
reaction is the second in the synthetic pathway of the essential branched side chain amino acids valine and isoieucine. 

45 Number of members: 29 

[18163 [1] Mediine 97361 822 The crystal structure of plant acetohydroyy acid isomeroreductase compleved wtth 
NADPH. two magnesium ions and a herbicrdal transition state analog determined at 1 65 A resolution. Biou V. Dumas 
R. Cohen-Addad C. Douce R. Job D. Pebay-Peyroula E. EM BO J 1997:16:3405-3415. 
[1817] 773. Prokaryotic membrane lipoprotein iipid attachment site 

so PROSITE cross-reference(s): PROKARJJPOPROTEIN 

In proharyotes : membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 
lipoprotein signal peptidase (signal peptidase II) The peptidase recognizes a conserved sequence and cuts upstream 
of a cysteine residue to which a glyceride-fatty acid lipid is attached [1], Some of the proteins known to undergo such 
processing currently include (for recent listings see [1,2,3]): 

ss 

Major outer membrane lipoprotein (murein-lipoproteins; (gene ipp), 
Escherichia coli lipoprotem-28 (gene nlpA). 
Escherichia coli ltpoprotein-34 (gene nlpB). 
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Escherichia colt lipoprotein ntpC. 
Escherichia colt lipoprotein nlpD. 

Escherichia coll osrnotically inducible lipoprotein B (gene osrnB} 

Escherichia coli osmotica!!y inducible lipoprotein E (gene osmE}. 
s - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 

Escherichia colt rare lipoproteins A and B {genes rplA and rplB). 

Escherichia coii copper homeostasis protein cutF ior nlpEi. 

Escherichia coii piasmids traT proteins. 

Escherichia coii Col piasmids lysis proteins. 
to - A number of Bacillus beta-lactamases. 

Bacillus subtilis penpiasmic oligopeptide-birtding protein (gene oppA). 

Borteiia burgdorferi outer surface proteins A and B (genes osp.A and ospB) 

Borreiia hermsit variable major protein 21 {gene vmp21) and 7 {gene vmp7). 

Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
*s - Fibrobacter succinogenes endogiucanase cel-3 

Haemophilus influenzae proteins Pal and Pep. 

Klebsiella pulluiunase (gene pulA) 

Klebsiella pulluiunase secretion protein puis. 

Mycoplasma hyorhinis protein p37 
so - Mycoplasma hyorhinis variant surface antigens A. B, arid C igenes vlpABCi. 

Neisseria outer membrane protein H.8. 

Pseudomonas aeruginosa lipopeptide (gene IppL) 

Pseudomonas solanacearum endogiucanase egl 

Rhodopseudornonas viridis reaction center cytochrome subunit (gene cylC). 
2& - Rickettsia 17 Kd antigen. 

Shigella fiexneri invasion piasmid proteins mxiJ and mxiM 

Streptococcus pneumoniae oligopeptide transport protein A (gene amiA) 

Treponema paiiidium 34 Kd antigen. 

Treponema paiiidium membrane protein A (gene tmpA). 
30 - Vibrio harveyi chitobiase (gene chb). 

Yersinia virulence piasmid protein yscJ. 

Halocyanm from Natrobacterium pharaonis [-1], a membrane associated copper- binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion) 

35 

[1818] From the precursor sequences of all these proteins, we derived a consensus pattern and a set of rules to 
identify this, type of post-translational modification 

[1819] Consensus p a tte r n { D E R K}( 6 }-[ L I V M F WSTA G j ( 2 )- [L I V M F /3TAGCQHAGSJ-C [C is the lipid attachment site] 
Additional rules; 1 ) The cysteine must be between positions 1 5 and 35 of the sequence in consideration. 2; There must 
40 be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2jKlem P., Somorjai R t. . Lau P.C.K Protein Eng. 2:15-20(1988!. 
[ 3jvon Heijne G. Protein Eng 2-531-534(1989}. 
4S [ 4]Mattar S . Scharf B . Kent S B.l-t . Rodewaid K., Oesterhett. D , Engelhard M J Bioi. Chem. 269 14939-14945 

(1994). 

[1820] 774 Aminoacyl-transfer RNA synthetases class-!l signatures 

PROSITEcross-referencets); AA_TRNA_LIGASEJM; AA_TRNA„LIGASEJI_2 Ammoacyl-tRNA synthetases (EC 
so 6 1.1.-} [1 j are a group of enzymes which activate ammo acids and transfer them to specific tRNA molecules as the 
first step in protein biosynthesis, in prokaryotio organisms there are at least twenty different types of aminoacyi-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two aminoacyl-tRNA synthetases for 
each different amino acid: one cytosolic form and a mitochondria! form. While ail these enzymes have a common 
function, they are widely diverse in terms of subunit size and of quaternary structure 
55 [1821] The synthetases specific for alanine, asparagine. aspartic acid, glycine, histidine. lysine, phenylalanine, pro- 
line, serine, and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattern 
in their catalytic domain for the binding of ATP and ammo acid which is different to the Rossmann fold observed for 
the class I synthetases [7]. 
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[1822] Class-il tRNA synthetases do not share a high degree of similarity, however at [east three conserved regions 
ate present [2.5.8]. Signature patterns from two of these regions, have been derived 
[1823] Consensus pattern[FYH]-R-x-[DE]-x(4. 12i-[RH]-:<!3}-F->(3)-[DEj 

Consensus pattern[GSTALVFHDENQHRKPHGSTAHLlVMFHDE]-R-[LlVMF]-x-[LIVMSTAGHLfVMFY] 

s 

[ IjSchimmel P. Annu, Rev. Biochem. 58: 125-1 58(1 987) 
E2]De!arue M., Moras D. BioEssays 15:675-887(1993), 
[SJSchimme! P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4jNagel G.U., Doolittle R.F. Proc. Natl. Acad. Sci. U.S A. 88:8121-8125(1991} 
10 [ SjCusack S , Haertlein M , Leberman R Nucleic Acids Res 19:3489-3498(1991) 

[ejcusack S. Biochimie 75:1077-1081(1993). 

E 7jCusack S., Berthet-Colommas C, Haertlem M.. Nassar N., Leberman R. Nature 347:249-255(1990). 
[SJI.eveque F, Plateau P., Dessen P. Bianquet S. Nucleic Acid?. Res. !8. 306-312(1990). 

*5 [18243 775. X. Trans-activation protein X 

This protein is found in hepadnaviruses where it is indispensable for replication, [slumber of members: 91 
[1825] 776. Thymidyiate synthase active site 

[1826] Thymidyiate synthase (EC 2.1.1.45) [1,2] catalyzes the reductive methylation of dUMP to dTMP with con- 
comitant conversion of 5, 1 O-.methylenetetrahydrofolate to dihydrofolate Thymidyiate synthase plays an essential role 
so in Df-JA synthesis and is an important tatget for certain ehernotherapeutic drugs 

[1827] Thymidyiate synthase is an enzyme of about 30 to 35 Kd in most species eycept in protozoan and plants 
where it exists as a bifunctional enzyme that includes a dihydrofoiate reductase domain 

[1S28] A cysteine residue is involved in the catalytic mechanism (it covalentiy binds the 5 6-dihydro-dUMP interme- 
diate). The sequence around the active site of this enzyme is conserved from phages to vertebrates. 
25 [18293 Consensus patternR-x(2HLIVM3-x(3HFWHQN^ [C is 

the active srte residue] 

E 1] Benkovic S.J. Annu, Rev. Biochem. 49:227-251(1980). 

[2j Ross P, O'Gara F., Condon S Appl Environ. Microbiol. 56.2156-2163(1990) 

[1830] 777. Glycosy! hydrolases family 31 signatures 

[1831] It has been shown (1.2,3,E1j that the following glycosyi hydrolases can be. on the basis of sequence similar- 
ities, classified into a single family: 

35 - Lysosomal alpha-glucosidase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase active at low pH. which 

hydrolyrres alpha(1->4) and alpha(1->6t linkages in glycogen, maltose, and rsomaitose. 

Alpha-glucosidase i EC 3 2 1 20 1 from the yeast Candida tsukunbaensis 

Alpha-gkfcosidase (EC 3.2. 1 .20) {gene malA) from the archebactena Sulfolobus soifataricus. 

Intestinal sucrase-isomaltase (EC 3 2 1 48 /EC 3.2.1.10} is a vertebrate membrane-bound, multifunctional enzyme 
40 complex which hydrolyzes sucrose, maltose and isomaltose. The sucrase and isomaltase domains of the enzyme 

are homologous (41% of amino acid identity) and have most probably evolved by duplication. 

Glucoamylase 1 (EC 3.2.1 3} (glucan 1,4-alpha-glucostdase) from various fungal species 

Yeast hypothetical protein yBR229c 

Fission yeast hypothetical protein SpAC30D'll01c. 

45 

[18323 ^ n aspartic scid has been implicated [4] in the catalytic activity of sucrase. isomaltase, and lysosomal alpha- 
glucosidase The region around this active residue is highly conserved and can be used as a signature pattern A 
second region, which contains two conserved cysteines, has been used as an additional signature pattern 
[1833] Consensus pattern [GF3~[UVMF]-W-x~D~M-[NSA)-E fO is the active site residue] 
so Consensus pattern G-[AV3-D-[L!VMTA3-C-G-[FY3-x(3)-[ST]-x(3)-L-C-x-R-W-x(2HLV]-EGSAHSA]-F-x-P-F-x-R-[DN] 

[ 1] Henrissat B Biochem J 280.309-316(1991) 

[23 Kinsella B.T., Hogan 3., Larkin A„ Cantwell 8.A. Eur. J. Biochem. 202:857-664(1991). 
[ 3j Nairn H.V., Ntermann I, Kteinhans U.. Holienberg C R. Strasser A W.M. FEBS Lett 294: 109-1 12(1 991). 
ss [4] Hermans M.M.P.. Kroos M.A., van Beeumen j„ Oostra BA, Reuser A.J.J. J. Biol. Chem, 288:13507-13512 

{1991}. 

[1834] 778. Urease signatures 
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[1835] Urease tEC c C I CI is a nickel-binding enzyme that catalyzes the hydtolysis ot uiea to catbon oiovide and 
ammonia [ lj Hts.tOftt.ally it th*> fust ercyin^ to ^ijbtallizKJ (in 192C) it ib mainK found in plant s^eds miao- 
orqanisms and invitee tales In plants uf€.-asi : ' is a ho<arnei of identital chains in IxKiona [z] it consists oteithti two 
or three different subuntts (alpha, beta and gamma), 
s [1836j Urease bindstv-o nickel tonspersubunit four ntstidtno an aspartate and a carbamated-lj sine serve as ligands 
to thei.e metals an addittt nal histioine is iru*. k<rd in the o.tt .tiytir mechanism [3] 

[1837] At, signatures for this enzune. a region v»as stlec-Uso that contains Sao histidint; that bind una of the, mcko! 
tons and the region of the active site histidme. 

[1838] C onsensus pattern l-jAv j-fGAHGAf {-[LlVMj- [> <-H-[t. ivMj H v, i> p ] (tie rv*o H's hind nrkelj 
10 Consensus pattern [LiVMj(2)-[CT]-H-EHN3-L-x{3H!-IVrvi]-xf2VD-[LiVM3-x-F-A [H is the active site residue] 

[ 1 3 Takishima K., SugaT.. Mamiva G. Eur. d. Biochem. 175:151-165(1988). 

[2]MobleyHLT Htisinger R F Mkrobiol Rev 53-85-108(1989) 

[3]JafcnE CairMB Hunsmgw R P KnplusPA Sc ien:e 168 998-1 00-ff f 995; 

i5 

[1839] 7'<'<j fyrosin* 1 specific protein phosphatases signatuie and profiles 

[1840] 'iwosine specific protein ptx % hatases {£:C 3 t 3 48}(P'i P^a*e; 1 1 to lijar? enzymes that tatalv the temt ^ .tl 
of a phosphate gioup attached to a tyrosine tesidue Tnese enzymes are \en> important in tne control of cell gtowth 
proliferation, differentiation and transrortnation. Multiple forms of PTPase have been characterized and can be classi- 
20 fted into two cats jones soluble PTPase-, and ttan^membiane receptor proteins that cc ntatn PTPase dc niatriis ) The 
currently known PTPases are listed below; 
[1841] Soluble PTPases. 

- PTPN1 (PTP-1BV 

25 . PTPN2 (T-ceii PTPase: TC-PTP). 

- PTPN3 {H1 ) and PTPN4 (MEG), enzymes that contain an N-terminal band 4. 1- like domain (see <POOC0Q586>) 
and t oukf aot at itinc he ns t etwe-en the msr-mbtansr- anci oytoskeieton 

- PTPN5 (STEP). 

- PTPN6 (PTP-iC HCP SHP> ano PTPN 11 (PTP-CC SH-PTP3 S\pi enzymes *nieh contain two copies ot the 
30 domain at its N-terminal oftcmih' 1 ho Die sc \ htls proteirKGiksciew(o.eneci J v\ i also belong^ tothii, stibgiouf 

PTFN7 iLC-PTP Hematopoietic prorVin-tviosin^ phosphatase HePTP) 

- PTPN8 (7QZ-PEP). 

- PTPN9 (MEG2). 

- PTPN12{PTP-G1: PTP-P19V 
35 - Yeast PTP1. 

feast PTP2 which may be involved in the ubiquitm-mediated protein degradation pathway. 
Fission yeast pyp1 and pyp2 which play a role in mhibtoq the onset of mitosis. 
Fission yeast pvp3 which contributes to the dephnsc hotylatmn of -ai-2 
Wabt CDC 14 which may oe invoKed in chromosome segtegation 
40 - Yersinia vnult-nt.* 1 pla^mid PTPA^eb t.g<=ne vopHi 

Autographs caitfornica nuclear polyhedrosis virus 19 Kd P ! Pase. 

[1842] Dual specificitv PTPases. 

45 - DU3P1 tPTFN1 n MAP kinase phosoh ii*se-1 Mh P-1 1 Much o^phosphoryl lies MAP kinase on both Thr-181 
and Tyr-185. 

- OUoP? (PAC ! i a nuclei *rrvme th nt depht>sphoi>lat>s MAP hnases EM 1 and K PK,' on both fhi and 'f\r 
resioues. 

- DUSP3 (VHR). 
50 - DUSP4 (HVH2). 

- DUSP5(HVH3l 

- DUSP6fPyst1: MKP-31 

- DUSP7 (Pyst2: MKP-X). 

- ".eait M?' t> a PfPase that dephoschoivlates M \P kinase FUS3, 
ss . Yeast YVH1. 

Vaccinia ^irus H1 FTFabf 1 a dual bper-incity phoschatase. 

[1843] Receptor PTPases. 
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[1844] Structurally ail known leceptor PTPases. are made up of a variable length extracellular domain, followed by 
a transmembrane region and a C-teimmal catalytic cytoplasmic domain Some of the receptor PTPases contain ft- 
bronectm type 111 (FN-lll) repeats, immunoglobulm-hke domains, MAM domains or carbonic anhydrase-like domains 
in then extracellulai region The cytoplasmic legion generally contains two copies of the PTPAse domain The first 
s seems to have enzymatic activity, while the second is inactive but seems to affect substrate specificity of the first in 
these domains, the catalytic cysteine is generally conserved but some other, presumably important lesidues .are not 
[1845] In the following table the domain structure of knov\n receptor PTPases it shown' 



Extracellular 


Intracellular 




Ig FN-3 CAH MAM PTPase 


Leukocyte common antigen (LCA) (CD45) 


0 


2 


0 


0 


2 


Leukocyte antigen related (LARt 


3 




0 


0 


2 


Drosophila DLAR 


3 


9 


0 


0 


2 


Drosophila DPTP 




2 


0 


0 


2 


PTP-alpha(LRP) 


0 


0 


0 


0 


2 


PTP-beta 


0 


18 


0 


0 


1 


PTP-gamma 


0 


1 


1 


0 


2 


PTP-delta 


0 


>7 


0 


0 


2 


PTP-epsiion 


0 


0 


0 


0 


2 


PTP-kappa 


1 


4 


0 


1 


2 


PTP-mu 


1 


4 


0 


1 


2 


PTP-zeta 


0 


1 


1 


0 


2 



[1846] PTPase domains consist of about 300 amino acids There are two conserved cysteines, the second one has 
been shown to be absolutely required for activity Furthermore, a number of conserved residues in its immediate vicinity 
have also been shown to be important. 

[1847] A signature pattern was derived for PTPase domains centered on the: active site cysteine. 

[1848] There are three profiles for PTPases. the first one spans the complete domain and is not specific to any 

subtype The second profile is specific to dual-specificity PTPases and the third one to the PTP subfamily 

[1849] Consensus pattern [L!VMF]-H-C-x[2)-G-x{3}-[STC]-[&TAGP]-x-[LIVMFY] [C is the active site residue] 

[1850] Notethe M-phase inducer phosphatases (cdc25-type phosphatase') are tyrosine-protem phosphatases that 

are not structurally related to the above PTPases 

[1851] Notethis doc umentation entry is linked to both a signature pattern and to profiles. As profiles 3re muc h more 
sensitive than the pattern, you should use them if you have access to the necessary software tools to do so. 



[ I] Fischer EH . Charbonneau H.. Tonks N.K Science 253 401 -40&{1S91). 
[ 2] Charbonneau hi., Tonks N K. Annu Rev. Cel! Bio! S 463-493(1992). 
[ 3] Trowbridge I S. J. Biol Chem. 266 2351 7-23520(1 991) 
[4] Tonks i\l tv, Charbonneau H Trends Biochem Sci 14 , 4&7-600(1&89). 
[5] Hunter T. Cell 58:1013-1016{1989). 



[1852] 780. Connexins signatures 

[1853] Gap junctions (1 ] are specialized regions of the plasma membrane which consist of closely packed pairs of 
transmembrane channels, the connexons, through which small molecules diffuse from a cell to a neighboring cell. Each 
connexon is composed of an hexamer of an integral membrane protein which is often referred to as connexin. in a 
given species there ate a number of different, yet structurally related, tissue specific, forms of connexins The types 
of connexins which are currently known are listed below. 



- Connexin 56 iCx58} 

- Connexin 50 <C:<50} (iens fiber protein MP70). 

- Connexin 48 (Cx48) (alpha-3) 

- Connexin 45 (Cx45) (alpha-6). 

- Connexin 43 (Cx43) (alpha-1 ). 

- Connexin 40 (Cx40) (alpha-5). 

- Connexin 38 (Cx38) (aipha-2). 
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Connexin 37 t'Cy.37) (aipha-4) 

- Connexin 33 (Cx33) (afpha-7). 

- Connexin 32 (C:<32) (beta-1 } 

- Connexin 311 (Cx31.1) (beta -4). 

- Connexin 31 (Cx31) (beta-3). 

- Connexin 30.3 (0x30.3) (beta -5} 

- Connexin 26 (Cx26) (beta-2), 

[1854] Structurally the connexins consist of a short cytoplasmic N -terminal domain, followed by four transmembrane 
segments that delimit two extracellular and one cytoplasmic, loops; the C-termmal domain is cytoplasmic and its length 
is variable (from 20 residues in Cx26 to 260 residues in Cx56t. The schematic representation of this structure is shown 
below. 



Cytoplasmic 



[1855] The sequences of the two extracellular loops are weii conserved In both loops there are three conserved 
cysteines, which are involved in disulfide bonds A signature patterns from each of these two loop legions has been built 
[1856] Consensus patternC-[DNj-T-x-Q-P-G-C-yi2)-V-C-[F y>D [The thiee C's 3re involved in disulfide bonds] Con- 
sensus patternC-x(3,4)-P-C-x(3H!-IVMHDEN3-C-[FYHL!VMHSAHKR]-P [The three C's are involved in disulfide 
bonds] 

[1857] [ 1] Goodenoucih D A , Goiter J A . Paul D L Annu Rev Biochern 65 475- 502C 1996} 
[1659] 781. Gram-positive cocci surface proteins 'anchoring' hevapeptide 

[1859] Surface proteins from Gram-positive cocci contains a conserved hexapeptide located a few residues down- 
stream of a hydrophobic C-termina! membrane anchor region which is followed by a cluster of basic amino acids [1 j 
This structure is represented m the following schematic representation 



| Variable length extracellular domain jH| Anchor |B| 



*H': conserved hexapeptide, 
'B': cluster of basic residues. 
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[1860] it has been proposed that this hexapeptrde sequence is responsible for a post-trans! at tona! modification nec- 
essary for the proper anchoring of the proteins which bear tt. to the cell wall 
Proteins known to contain such hexapeptide are listed below: 



Aggregation substance from streptococcus faecalis (asat) 

€5a peptidase from Streptococcus pyogenes (scpA.i. 

C protein alpha-antigen from Streptococcus agalactia*: (bca). 

Cell surface antigen l/!l (PAC) from Streptococcus mutans. 

Dextranase from Streptococcus downer (dex). 

Fibroneetin-binding protein from Staphylococcus aureus (fribAj 

Fimbria! subunits from Actinomyces naeslundii and viscosus 

IgA binding protein from Streptococcus pyogenes (atp-1) 

IgA binding protein {B antigen) from Streptococcus agaiactiae (bag). 

IgG binding proteins from Streptococci and Staphylococcus aureus. 

Internalin A from Listeria monocytogenes (inlA). 

M proteins from streptococci, 

Muramidase- released protein from Streptococcus, suis i'mrpj. 

tsiisin leader peptide processing protease from Lactococcus lactis fnisP). 

Protein A from Staphylococcus aureus. 

Trypstn-resistant surface T protein from streptococci. 

Wall-associated protein from Streptococcus mutans fwapA). 

Wail-associated serine proteinases from Lactococcus lactis 



[1861] Consensus pattern L-P-x-T-G-fSTGAVDEj 

[18623 [ 1] Schneevvind O . Jones K F.. Fischetti V A J Bacteriol. 172.3310-33 1 7(1990) 
[1863] /'82 Gamma-glutamyltranspeptidase signature 

[1864] Gamma-glutarnyttranspeptidase (EC 2.322) (GGT) [1 J catalyzes the transfer of the Cfamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamatei. GGT plays a key roie 
m the gamma-glutamyi cycle, a pathway for the synthesis and degradation of glutathione, in prokaryotes and eukary- 
otes. it is an enzyme that consists of two polypeptide chains, a heavy and a light subunit. processed from a single 
chain precursor The active site of GGT is known to be located m the light subunit 

[1865] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2J. Pseu- 
domonas cephalosporin acylases (EC 3.5.1.-} that convert 7-beta-(4-carboxybutanamidoVcepha!osporanic acid (GL- 
7 AC A) into 7-amtnocephalosporanic acid (7ACA) and giutaric acid are evolutionary related to GGT and also show 
some GGT activity [3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 
[1866] One of the conserved regions correspond to the N-termmal extremity of the mature light chains of these 
enzymes. This region has been used as a signature pattern. 

[18673 Consensus patternT-[STA)-H-y-[ST3.[LIVMA3-x(4)-G-[SN]-;<-V-[STA)-y-T-x-T-[LlVM]-[NE3-x(1 ,2)-[F /]-G 



[ 1]Tate S S Meister A. Meth. Enzymol. 113:400-419(1985). 

[2] Suzuki H., Kutnagai H, EchigoX, Tochikura X J. Bacterid!. 171:5169-5172(1989). 
[ 3] Ishiye M.. Ntwa M. Biochim. Biophys. Acta 1132:233-239(1992). 



[18683 783. FeiTochelatase signature 
45 [18693 Ferrochelatase i EC 4.99 1 1 i (protoheme ferro-lyase) [1 .2] catalyzes the last step m heme biosynthesis 1 the 
chelation of a ferrous ion to proto-porphynn IX, to form protoheme. 

[1870] In eukaryotes, ferrochelatase is a mitochondrial protein bound to the inner membrane, whose active site faces 
the mitochondrial matrix. The mature form of eukaiyotic ferrochelatase is composed of about 360 amino acids in 
bacteria, ferrochelatase (gene hemH) [3] is a protein of from 310 to 380 amino acids 
so [1S71] The human autosomal dominant disease protoporphyria is due to the reduced activity of ferrochelatase. 

[18723 ^ e signature pattern for this enzyme is based on a conserved region which contains a histidine residue which 
could be involved in binding iron, 

[18733 Consensus pattem[LlVMF3(2)-x-[ST)-x-H-EGSHLiVM^P-x(4 l 5HDENQKR^x~G-[DP^x^1 l 2)-Y 



ss [ 1] Labbe-Bois R. J. Biol Chem. 265 7278-7283(1990). 

[ 2] Brenner D A., Frasrer F Proc. Natl. Acad Sci USA 38' 849-853(1 99 1 ). 

[ 3] Miyamoto K., Nakahigashi K„, Nishimura K., Inokuchi H. J. Moi. Biol. 219:393-398(1991). 
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[1874] 784. Cellulose-binding domain, bacteria! type 

[1S75] The microbial degradation of cellulose and yyians requires several type? of enzyme such as endoglucanases 
(EC 3.2 1 4). cellobiohydrolases (EC 3.2.1 91 ) (exocilucanases), or xylanases (EC 3.2.1 8) [1] 
[1876] Structurally, celluloses and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
s domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids 

[1877] The CBD of a number of bacterial oellulases has been shown to consist of about 105 ammo acid residues 
[2]. Enzymes known to contain such a domain are: 

Endogiucanase (gene endl ) from Butyrivibno fibrisolvens. 
10 - Endoglucanases A (gene cenA) and B (cenB) from Ceiiuiornonas firm. 
- Exoglucanases A (gene cbhA) and B (cbhB) from Ceiiulornonas firni 
Endogiucanase E-2 (gene cefB) from Thermomonospora fusca. 
E: ndocsluoanase A ( gene celA) from Microbispora bispora 

Endoglucanases A (gene celA). B (ce!B) and C fceiC) from Pseudomonas fluorescens. 
*s - Endogiucanase A {gene celA) from Streptomyces lividans. 

Exocellobiohydrolase (gene cex) from Ceiiulornonas fiini. 

Xylanases A (gene xynA.i and B (:<ynB) from Pseudomonas fluorescens. 

Arabinofuranosidase C (EC 3 2 1 S5j (xyianase Cj (gene yynC) from Pseudomonas fluorescens. 

Chitinase 63 (EC 3.2. 1.14) from Streptomyces plicatus. 
so - Chitinase C from Streptornyces lividans. 

[1878] The CBD domain is found either at the N-terminal or at the C-terminai extremity of these enzymes. As it is 
shown in the following schematic reptesentation, there are two conserved cysteines in this CBD domain - one at each 
extremity of the domain - which have been shown [3] to be involved in a disulfide bond. There are also four conserved 
25 tryptophan residues which could be involved in the interaction of the CBD with polysaccharides. 



+ + 

i I 

xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 



35 'C: conserved cysteine involved in a disulfide bond. '*': position of the pattern. 



Consensus patternW-N-[STAGRj-[STDNHLiVM)-v(2)-EGST|->;-[GST|->;^2!- [LIVMFTHGAj 

[1] Giikes N.R., Hsnrissat B., Kilburn D.G., Milier R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[2j Weinke A,, Gilkes N.R., Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Protein Seq. Data Anal. 4:349-353(1991 ). 
[ 3] Gilkes N R.. Claeyssens M.. Aebersoid R„ Heniissat B., Meinke A,, Morrison H.D., Kilburn D.G., Warren R.A 
J., Miller R.C. Jr. Eur. J. Biochem, 202:367-377(1981). 

45 

[1879] 785 Amidases signature 

[1880] It has been shown [1 ,:?,3 j that several enzymes from various prokaryotic and eukaryotic organisms which are 
involved in the hydrolysis of amides lamidases) are evolutional y related These enzymes aie listed below 

so - Indoleacetamide hydrolase (EC 3.5.1.-), a bacteria! plasmid-encoded enzyme that catalyzes the hydrolysis of in- 
dole- 3-acetamide i, I AM i into indole -3- acetate (I AA). the second step in the biosynthesis of au>ms fiom tryptophan 
Acetamidase from Emencella mduians (gene amdSl an enzyme which allows acetamide to be used ds a sole 
carbon or nitrogen source. 

Amidase <EC 3 5 14) fiom Rhodococcus sp N-774 and Bfevibactenurn sp R312 (gene amdA} This enzyme 
ss hydrolyzes propionamides efficiently, and also at a lower efficiency, acetamide, acryiamide and indoleacetamide 

Amidase (EC 3 5 14) from Pseudomonas chlororaphis. 

6-aminohexanoate-cyciic-dimer hydrolase (EC '5 5 2 12) (nvion oligomers degrading enzyme El) (gene nyiA) a 
bactenai plasmid encoded enzyme which catalyzes the first step in the degradation ot6-aminohe>anoic acid cyclic 
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dimer, a by-product of nylon manufacture [4]. 
Gfutaroyf-fRNA(Gln) amidotransferase subunit A [5]. 
Mammalian fatty acid amide hydrolase (gene FAAH> [6j. 
A putative amidase from yeast (gene AMD2}. 
s . Mycobacterium tuberculosis putative amidases amiA.2 ; amiB2 ; amiC and amiD. 

[1881] All these enzymes contain tn their central section a highly conserved region nch in glycine, serine. and alanine 
residues. This region has been used as a signature pattern. 
Consensus pattern: G-jGA]-S"[GSHGSK^MGSAHGSAVY]"V"(UVM]-jGS 
10 [LIVM]-R-x-P-[GSAC] 

[ 1] Mayaux J.-F., Cerbelaud E., Soubrier F., Faucher D., Petre D. J. Bacterid. 172:6764-6773(1990). 
[2j Hashimoto Y., NishiyamaM., ikehata O., Horinouchi S., BeppuT. Biochim. Biophys. Acta 1088:225-233(1991). 
[ 3] Chang T.-H., Abefson J. Nucleic Acids Res. 18:7180-7160(1990). 
»5 [4jTsuchiyaK. i Fukuyama S , KanzakiN , Kanagawa K. ; NegoroS . OkadaH.J. Bacterid!. 171:3187-3191(1989). 

[ 5] Curnow A.W.. Hong K.W., Yuan R., Kim S.I.. Martins O., Winkler W., Henkin T.M., Soil O. Proc. Natl. Acad. 
Sci U.S.A 94'11819-H 826(1997). 

[6] Cravatt 8.F., Giang D.K., Mayfteld S.P., Boger D.L., Lerner R.A., Giiula N.B. Nature 384:83-87(1996}. 

so [1882] 788. Glycosyl hydrolases family 10 active site 

[1883] The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases 
{EC 3 2 14). cellobtohydroiases (EC 3 2 1 91) (exoglucanases). orxylanases (EC 3 2 18) [1.2], Fungi and bacteria 
produces a spectrum of celluioiytic enzymes (celluiases) and xylanases which, on the basis of sequence similarities, 
can be classified into families One of these families is known as the cellulase family F (3] or as the glycosyi hydrolases 

2& family 10 (4. £1). The enzymes which are currently known to belong to this family are listed below 

Aspergillus awarnori xylanase A (xynA) 
Bacillus sp. strain 125 xylanase (xynA). 
Bacillus stearotherrnophiius xylanase. 
30 - Butyri vibrio fibnsolvens xylanases A (xynA) and B (xynB). 

Caldocellum saccharolyticum bifunctional endoglucanase/exoglucanase (celB) This protein consists of two do- 
mains; it is the N-termina! domain, which has exoglucanase activity, which belongs to this family. 
Caldocellum saccharolyticum xylanase A (xynA). 

Caldocelium saccharolyticum ORF4. 'This hypothetical protein is. encoded in the xyn.ABC operon and is probably 
35 a xylanase. 

Cellulomonas fl mi exoglucanase/xylanase (cex). 

Clostridium stercoranurn thermostable celloxylanase. 

Clostridium thermocellum xylanases V (xynY) and Z (xynZ) 

Cryptococcus albidus xylanase, 
40 - Penicillium chrysogenum xylanase (gene xylP). 

Pseudornonas fiuorescens xylanases A (>ynAi and B (:<ynB) 

Puminococcus flavetaciens bifunctional xylanase >YLA (xynA) This protein consists of three domains a N-ter- 
mina! xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a centra! domain composed of 
short repeats of Gin. Asn an Trp, and a C -terminal xylanase catalytic domain that belongs to family 10 of giycosyi 
45 hydrolases. 

Streptomyces lividans xylanase A (xln.A). 
Thennoanaerobacter saccharolyticum endoxyianase A (xynA). 
Thermoasciis aurantiacus xylanase. 
Thermophilic bacterium Rt8.84 xylanase (xynA). 

[1884] One of the conserved regions, in these enzymes, is centered on a conserved glutamic acid residue which has 
been shown [5], in the exoglucanase from Cellulomonas firm, to be directly involved in glycoside bond cleavage by 
acting as a nucleophile. This region has been used as a signature pattern. 

[1885] Consensus pattern3;GrA3-y(:?!--EUVN3-x-(iVMF}-jST j-lv-jUY |-[i:>FNt]-[ L.EVMF] [E is the active site residue] 

ss 

[ 13 Beguin P Annu Rev. Microbiol 44 219-248(1990). 

[ 23 Gilkes N R . Hennssat B.. Kilburn D G . Miller R C Jr . Warren R A.J. Microbiol Rev 5S: 303-3 15(1 991) 
[ 33 Henrissat B . Claeyssens M , Tomme P . Lemesle L. Momon J -P Gene 81:83-95(1989) 
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E 4j Hennssat E Btochem .( 28U 300-3 tOtlCfc 1) 

E 5] Tul! D,. Withers S.G., Gilkes N.R.. Kilburn D.G.. Warren R.A.J.. Aebersold R. J. Biol. Chem. 266:15621-15625 
(1991V 

s [1886j ^37 Fructose-bisphosohate aldolase class 1! signatures 

[1887] r-'ructcsc-hisphosph.jt? aiders? iE-0 4 t 2 13}]1 2\ is a glycolytic en::vme that i^.jiyzes the n^c-t* ible alcfol 
ci> j k ige ui condensation of fuif to ^e- 1 6- hisphusphatu into dihydroxyao^tontj-pho^phati* arid ciiyc> j raidt;hyde ^-phos- 
phate. There are two classes ot fructose-bisphosphate aldolases with different catalytic mechanisms. Class-! I aldolases 
[2). mainly tound m prokaryotes and fungi, are homodimene enzymes which require a divalent meta! ion - generally 

10 zinc - for their activity. 

[1888] This f iinilv alt.ii includes th*» fill lowing proteins 

Escherichia coll galactitol operon protein gatY which catalyzes the transformation of tagatose 1.6-bisphosphate 
into glyceron^ phosphate ano D- glycerald^h^e 3-pnosph.jte 
*s - Escherichia coli N-acetyl galactosamine operon protein agaY vshich catalyzes the same reaction as that of gat*. 

[1889] &s signature pjttems, for this <Jass of pn.Tvme, two consetved i eg tuns. v^rc- selected Thv fits.t pattern is 
located in the first half of the sequence and contains two histfdine residues that have been shown [4] to be involved in 
binding a zinc ion The second is located in the ^-terminal section and contains clustered acidic residues and glycines. 
20 [1890] Consensus ( attem[FVv MT]->.1 3i-[L!\ MH]-[APN]-[Llv'M]->t,1 2 i-[LfvM]-H-v-D-H-[GACH] [Trn» twe H's aie 
zinc ligands] 

ConsenbUb pattern[LlVM]-E-v-E-[LlVM)-G-s{I)-EGW)-EGyTAH-E 

E 1] P> j rham R N Biorh#ni So<- Tians 1b 1*5-13" 1900) 
25 [2] Marsh J. J.. Lebberz H.G. Trends Biochem. Set. 17:110-113(1992). 

[ i\ von Jer Osten G H BaibasC f : 111 Wong C -H Stn^yA t Mot Microbiol 3 1626 163 7 f 1I-V>9t 
[4] Berry A.. Marshal! K..E. FEBS Lett. 318:11-16(1993), 

[1891] 78b Ptohj I oligopeptidase family serine active site 
30 [1892] I he pic lyl ohqopepttdast; lamily [1 2 3j eonstsi of a numfc er oi ^voluttonaiv re lated pepiidases whose cafai>tic 
activity seems to fce prodded by a charge lel-jy' svstem similar to that ot the ttypsin famih of setine pioteases out 
which evolved by independent convergent evolution. The known members of this family are listed below. 

Prolyl ^nciopc-ptidaseit-X 34 21 .?6i(PE)(.t!soo.t!l^dpos:t piolinede3Mni.}?nz\m?) PI: is an ?nzym? that cleaves 
35 peptide bonds on thi* C -terminal side of prolyl n^taiitis The seqiwKe of PE has hetm obtained from ;< nummfhan 

species (pig) and from bacteria (Fiavobactenuin meningosepticum and Aeromonas hvdrophilaV there is a high 
deqree of sequence conservation between these sequences. 

Eschenchia coli pmt^ase II (EC ^4 2! 33i [Hiaof ^pttd^Sf- B) (g<-ne prtBt *nrh cleaves peptioe bonds on thr> C- 
termtna! side of lysAl and argtmnjl tesidueb 
40 . Dtpeptidy! peptidase IV (EC 3.4. 14.5) (DPP IV). DPP IV is an enzyme that removes N-termmal dipeptides sequen- 
tially from polypeptides having unsubstituted N-fernnni provided that the penultimate residue is proline. 
Yeast i'fj:uolar riipeptiriyf aminopeptidase & ^DP&P A' i gene STE H) which is ies.ponsible tor the proteolytic mat- 
uration of the alpha-factor precursor. 

t'e.jst . jouolai dipeptidyl aminopeptidase 2 (DPAP B) tqene DAP") 
45 - -iryiaminii-aeid-r^i- 1 isinq tinzyme (EC 3 4 19 1) tJCyl-pepttde tv,drol3s> j t This tinzyme catalyses thu hydrolysis 
of the ammo-terminal peptide bond of an N-acetylated protein to generate a N-acetylated amino acid and a protein 
with a tree ammo-terminus. 

[1893] A conser\ ed serine residue has experimentally been shown t in E coli protease II as well as in pig and oacteriai 
so pe> to bu n^c^saty for the c^teMtc m^ch^msm This st-nne which fo p^rt of thf catalytic tn^d i bet His Asp} it. 
opner,jliy locateo about 1J-0 ic-stcines awav fiomthe C-temnn.il ^trvmity of these enzymes (which .jte .jli prctein* th^t 
contains about F00 to 800 ammo acids). 

[18943 Consensus patternD-x(3KA-x(3KL!VMFYVV]-xf 14)-G-x-S-x-G~G-[LlVMFYVyi(2) [S is the active site residue] 
[1895] Note thes.e proteins belong to families SWcSPEib^C in the classification of peptidases. j4 £■ 1 1 

ss 

E i] Ras*, lings Is! D Folgar L EairettAJ Eiochem J 279 90:~911{1Ct i i 
E 2] Banett ^ J Railing-* N D Bio! Ch^m Hopp*>-be\lH 37 3 =if>3-;t>0i199") 
Eo]PolgaiL SzaboE Biol Chein He ppe-Stylei 3^^ ^6t-366f 1 092) 
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[ 4j Rawlmgs N.D . Barrett A.J. Meth Enzymol. 244-19-61(1994). 
[1896] 789. Formate— tetrahydrofolate ligase signatures 

[1897] Formate--tetrahydrGfo!ate ligase {EC 6.3.4.3) {formyltetra hydro folate synthetase) (FTHFS) is one of the en- 
s zymes participating in the transfer of one-carbon units, an essentia! element of various biosynthetic pathways in many 
of these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofoiate {THF). Various 
reaction? generate one-carbon derivatives of THF which can be interconverted between different oxidation states by 
FTHFS ; methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5) and methenyitetrahydrofolate cyclohydrolase (EC 
3.5,4,9). 

10 [1S9S] in eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-1 -tetrahydrofoiate synthase 
(C1-THF synthase), which also catalyzes the dehydrogenase and cyclohydroiase activities. Two forms of C1-THF 
synthases are known [1], one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms 
the FTHFS domain consist of about 600 amino acid residues and is located in the C-termina! section of C1--THF syn- 
thase In proharyot.es FTHFS activity is expressed by a rnonofu notional homotetrarneric enzyme of about ?60 amino 

»5 acid residues [2]. 

[1S99] The sequence of FTHFS is highly conserved in all forms of the enzyme. As signature patterns, two regions 
that are almost perfectly conserved were selected. The first one is a glycine-nch segment located m the N-terminai 
part of FTHFS and which couid be part of an ATP-binding domain [2] The second pattern is located in the central 
section of FTHFS 
so [1900] Consensus p a tte r n G-[ Li V M ]- K-G-G-A- A-G-G-G- / 
Consensus patternV-A-T-(IV]-R-A-L-K-x-iHN]-G-G 

[ 1] Shannon K.W.. Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988). 

[ 2] Loveii C.R , Przybyla A Ljungdahl L.G. Biochemistry 29:5687-5694(1990). 

[1901] 790 Transthyretin signatures 

[1902] Transthyretin (prealbumin) [1] is a thyroid hormone-binding protein that seems to transport thyroxine f'T4) 
from the bloodstream to the brain, it is a protein of about 130 amino acids that assembles as a homotetramer and 
forms an internal channel that binds thyroxine Transthyretin is mainly synthesized in the brain choroid plexus, in 
30 humans, variants of the protein are associated with distinct forms of amyloidosis 

[1903] The sequence of transthyretin is highly conserved in vertebrates. A number of uncharactenzed proteins also 
belong to this family: 

Escherichia colt hypothetical protein yedX. 
35 - Bacillus subtiiis hypothetical protein yunM. 

Caenorhabditis elegans hypothetical protein R09H10.3. 
Caenorhabditts elegans hypothetical protein ZK697.8. 

[1904] Two regions were selected as signature patterns The first located in the N-terminai extremity starts with a 
40 lysine known to be involved in binding T4. The second pattern is located in the C-terminai extremity. 
[1905] Consensus pattern[KH]-[!V]-L-[DfJ3->ct3 !-G->-P-A-xf2H!V]->-l!V] [The K binds thyronine] 
Consensus pattern Y-[THHIV'HAP]-xf2)-L-S-[PQHFYWHGSHFYHQS] 
[1906] j l]SchreiberG. Richardson S.J Comp Biochem. Physiol. 116B:13M60( 1997) 
[1907] 791. Dihydropteroate synthase signatures 
■*s [1908] All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. Most microorgan- 
isms must synthesize folate de novo because they lack the active transport system of higher vertebrate celis which 
allows, these organisms to use dietary folates. O-nrrymes that are involved in the biosynthesis of folates are therefore 
the target of a variety of antimicrobial agents such as trimethoprim or sulfonamides 

[1909] Dihydropteroate synthase (EC 2.5.1.15; (DHPS) catalyzes the condensation of 6-hydroxymethyl-?.8-dihy- 
so droptertdine pyrophosphate to para-aminobenzotc acid to form 7,8-dihydropteroate. This is the second step in the three 
steps pathway leading from 6-hydroxymethyl-7,8"dihydroptenn to 7,8-dihydrofolate DHPS is the target of sulfonamides 
which are substrates analog that compete with para-ammobenzoic acid 

[1910] Bacterial DHPS (gene sul or folP) p] is a protein of about 275 to 3l5 amino acid residues which is either 
chromosomaiiy encoded or found on various antibiotic resistance plasmids. in the lower eukaryote Pneumocystis car-- 
55 inii. DHPS is the C-terminal domain of a multifunctional folate synthesis enzyme (gene fas) [2]. 

[1911] Two signature patterns for DHPS were developed, the first signature is located in the N-terminal section of 
these enzymes, while the second signature is located in the central section. 
[1912] Consensus pattern[LIVM]->-[AG}-[LIV'MFj(2)-N-x-T-x-D-S-F-<-D-^-[SG3 



286 



EP 1 033 405 A2 

Consensus pattern(GE]-ES^3- A -ELh/N5]^j-D-ELIVM3-G-[GP]- A t2VESTA3-^P 

E 1 ] Slock J bUhly D P Hanr-Y Si> E v\' Cpiwtord I P J BacUnol 1 72 ~"1 \\ -7228(1 Mi 

E q Volp ■?*> F Oyei M Sraife J G Daroy G Stammeis D K Delves C J Gene 112 21 3-21 8i 1092) 

s 

[1913] '92 Phosphrttidylinoiiki 3- and 4-kmases signatures 

[1914] Phosph ifidyiinositol 3-kinast; (P134inas<^ (EC 2 7 1 13""t \\ \ is an -»nzvm-» that phosphorates pho^phoi- 
nositides on the 3-h\droYyl group of the inositol ring Tne exact function of the three products of PI3-hinase - Pl-3-P 
Pi-3,4-P(2t and Pi-SA^V 1 ■ is not vet knov.fi although it is proposed that they function a? second messengeis tn 
to cell signalling, currently, three forms of Pl3-kinase are known: 

The mammalian enzyme which is a heterodimer of a 110 Kd catalytic chain (p110) and an 85 Kd subunit (p85) 
whic h allocs it to hind to activated rut sinv protein kinases 'I hete ate at least two diffeienf types, of p100 suhumts 
{alpha and beta). 

*s - Veast TOR1/0RR1 and TOR<:/DRR2 E2], PI3-kmases required for cell cycle activation. Both are proteins of about 
280 Kd. 

't'e.jst \. PS34 1 3], a PI3-kmase involved in vault lar sortino and segregation VP6>34 is a piotein t f about 100 Kd 
Arabidopsis thaliana and soybean VPSJ4 hornoiogs 

20 [1915] PhcsfJulidyiincsitol 4-kin=is« <Pl4-kirias-ii (EC 2 7 ! ^7i [4] a, an enzyme thai act', en phc sphafidvltnositol 
(PI) in the first committed step in the production of the second messenger inositol-1 .4. 5. -triphosphate Currently the 
following forms of PI4-kmases are known: 

Human PI4-kinase alpha. 
25 - Yeast P1K1, a nuclear protein of 120 Kd, 
- Yeast STT4, a protein of 21 4 Kd. 

[1916] The PI3- and PW-kinasts share a viell oon^ervtjd domain at thtstr C-ttimina! Eton this domain stems to 
be distantly i elated to tne catalytic domain of piotein kinases [23 Two signature patterns were developed from the best 
30 conserved parts of this domain. 

[1917] Four additional proteins belong to this family: 

Mammalian FKBP-rapamycin associated protein (FRAP ) E5], which acts as the target for the cell-cycle arrest and 
immunosuppressi^ e effects of the FhBP12-r^p.nny<.in complex 
A* - Yi^ast piotein ESR1 E^] which is Ef-quir-Kf for (.ell grovsih Dri^ is^p3tf md mmotn. recombination 
/east protein 'f [■: L l vmirh is involved in controlling telomere length 
Yeast hypothetical protein YHR099w, a distantly related member of this family. 
Fission yeast hvpoth^tic^l piotein £pAC22E12 1fiC 

40 [1918] Consensus pattern[LIVMFAC]-K-x(1,3)-[OEA3-[OE]-[LIVMC]-R-Q-[DE]-x(4j-Q 

Consensus patern[GSj-x-[AV]^(3V[LiVM3-x{2)-[FYH3-[LIVM]{2Vx-[LiVMF]-x-D-R-H-x{2VN 

E 13 Hiles I D Otsu M Voltma S Fiy M J , Gout l Dhand P , Panax otou G , Rut~ Lai res F 'I hompson A "forty 
N f : Hstun J J Courtneidge S A , Pctrket P I V\atetfieid M D Cell "0 419 41 <\ 1992s 
■>s E ~] Kurtz J Hynnquez R Schntidur U Qeutyr-Reinhard M Movs-a N Hall M H Cell "3 S85-59B, 19&3) 

EijSchuPV Takegawak Fty M J Stack J H v.'atetfield M D Emr S D Science :60 S8-y 1(1t93 > 
[ 4j Garcia Bustos J (■' Maunt F , Stevenson I , Rei C Hall M N i-MBQ J t i 2352 03C 1f 1«&4j 
[ 53 Brown E J Aloe is M VV bhin TB Ichikawa K Keith u T Lane w ^ bchreib.w S L future 3M 75^-7 , S8 
(1994). 

j(J E C] Kato R Ogav,a H Nucleic Acids R*s 22 31U-l-3112i1994} 

[1919] "'9? F&D-dependent glyretol-'Hhosphate dehydrogenase signatures 

[1920] FAD-dependentglycerol-3-phosphate deh>drogenase (EC 1 1 99 3; (GPDi catalyzes tne conversion of glyc- 
erophosphate into dihydio A yacf tone phosphate in bacteria E tj it is associated with the utilization of glycerol ooupied 
55 to respiration In EschernJn 1 coll Uo i^vymtjs are knov\n on.^ t;> pressed undur matiobio conditions (aenf- glp-^t 
and one in aeiobic conditions (gene gloDj In eukar\otes a mitochondrial form of GPD participates in the glyceiol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1,1,1,8 ) [2,33. 
[1921] Thest- en^ymts are \ tdarth of about 60 to v 0 Kd which contain a ( rcbable FAD-bindmq domain in lh. : 'ii N- 
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terminal extremity. The mammalian enzyme differs from the bacteria! or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1922] Two ^ignatuie patterns wen* de lope d Oru : ' bas^d on the fust half of the FA C-bi tiding domain and orn» which 
roiresponos to a consei^T tegion in the cential part of these enzymes 
s [1923] Consensus pattcrnflvj-G-G G v{Z i O [STACV] GxAvD y(3> R G 
Consensus p.jttetnG-G K- <u?)-fGSTE-! j- Y R-h^ 1 A 

[ 1] Austin D . Larson T.J. J. Bacterid 173:101-107(1991). 
[ 2j ftoennow & Kielland- Brandt M C Yeast & 1121 -1 1 30)1 & 93) 
10 [ a] Br(v\ n L I McDonald M J Lehn Cj A Moran S M J Biol Chum 280 14 16^-1 4366f ! 094 ) 

[1924] ""9-1 NOL1 NOP2/sun family sinnatuie 

[1925] The k Ikwmg proteins seem* to he evoiution.jiy telated 

»5 - Mammalian proliferating-celi nucleolar antigen p120 (gene NOL1 ) which may play a rote in the regulation of the 

ceil cycle and the increased nucleolar activity that is associated with the ceii proliferation. 

Yeast nucleolar protein NOP2 (or > NA1 ) which could be involved m nucleolar function during the onset of growth. 

and in tne maintenance of nucleolar structure. 

yeast hypothetical protein YBL024w. 
so - Bacterial protein sun (also known as frnul 

Escherichia colt hypothetical protein yebU. 

Mycobacteuum tuberculosa hypothetical protein MtCY2 iB4 24 

Methanol o<.cusjann=iS<.hii hypothetical piotein MJ002G 

25 NOL1 is a protein of 855 residues, NOP2 consists of 618 residues, V BL024w of 884, sun is a protein of about 450 to 
450 residues and MJQ26 has 274 residues. They share a conserved central domain which contains some highly con- 
served legions On*- of these ttroions v\as selected as a signatuie patlwn 
[1926] Consensus paftti n[FV]-D-[KR A]-[L! yMAJ-L-Y-D-[A\ ]-P-C-[3T]-[GA3 
[1927] 795. moaA / ntfB / pqqE family signature 

30 [1928] A number of proteins involved inih* bic synthesis of iwtailoLofadots haw he en show n[1 2j to t>*=> tiVotutKinaiv 
related. These proteins are: 

Bacterial and archebactenal protein moaA. which is involved in the biosynthesis of the molybdenum cofactor (mo- 
lybdopterin: MPT). 

35 - Arabidopsis thaltana fn*2 a protein irkolvud in mo!yhdupt> j nn biosynthesis md which is highlys stmilat to mo iA 
Bacillus subttlis narA, which seems to be the moaA ortholog in that bacteria. 

Bacterial protein ntfB ior fixZ) which is involved in the biosynthesis of the nttroqenase iron-molybdenum cofactor. 
Bacterial protein pqqE vvhich is involved in the biosynthesis of the cofactor pyrrolo-qumolme-quinone (PQQ). 
Pyiococcus furiosi is cmo a protein m\ ?l\ed in the synthesis of a molyhdoptenn-oased tungsten cofactoi 
40 - Caenoihabditis elt^ans hypothetical piotem F-19E2 ! 

[1929] All these proteins shaie in then N-tetminal tegion <j consei^T domain that contains thtee cysteines In 
moaA, these cysteines have been shown [ij to be important for the biological activity. They could be inolved in the 
binding of an iron-sulfur cluster. 
4S [1930] Consensus patt^mfLIV]-)'. 3>-C-[NP]-[L!\ MFHQPSK-^fn M]-C [Tht thr->e C's ire putative Fe-S lioands 

[ 1] Menendez C. Iglol G., Hennlnger H . Brandsch R. Arch, Microbiol, 164:142-151(1995!. 
[2]HotfT SchnorhM Meyer C c abuchf* M J Biol Chem 270 6 1 00-8 10~(1 995 1 

so [193t] 79r Foikhe^d-assouated (FHAi domain pafilte 

[1932] 'I he forkhead associated if HA) domain f1 t: 1 [ is a put.ttive ntkle.ji signalling domain found in .t variety of 
other vise unrelated protnns 1 he FHAdomjin compns^ approximately 55 to 7S amino acids and contains thr^e highly 
conseu'ed blocks separated oy divergent spacer regions Currently it has been found in the following proteins 

ss . Four transcttptiort factors that al^o contain a fori- h.*ad ( FH'dom iin mouse invxx ytt twluat faaot 1 iMNFD vuast 
transaction tactot FHL1 whicn piobably contiols pie-mRNA ptocessing and yeast FKHi and FKH2 In those 
protein the FHA domain is located N-termmal of the DNA-bmding FH domain. 

Kmase-asscoiated ptctein phosphatase (KAPP) from Aiabidopsis fh : iiian : i a protein which specifically mt^piofi, 
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y.ith the receptoi-tv'pe Ser/Tht-kinase RLKS In r^APP the FHA domain maps to a region that interacts with the 
receptor-type protein kinase RLKS only it the kinase is phosphorylated on serine residues [2], 
Tv>o pr^m kinas-is trcm yeasl that : ne involved in mediating tht nuclear r-isponse to DHA damage DUN1 and 
SPh f'SADI [3] The latter is tne only hnovm pmtnn ^ntainmg two copies of the FHA domain 

s . Protein kinase cdsl from fission \e-ast contains a FHA domain and mignt bo tho orthoiog of SFt 1 
Protein kinase MFK1 from east, which is involved m meiotic recombination. 
Human nuclear antigen Ki67 which is expressed onlv in proliferating cells. 
Yeast hypothetical protein yHR115c, which contains a RING-finger C-termina! of the FHA domain, 
/east hypothetical piotems 1.8083 1 and 9 346 1'' ^hich contain an extensive coiled coil region C terminal of the 

10 FHA domain. 

- Ca^norhabditis .Heaans hypothetical protein Zh6^2 2 
Caenoihabditis elt^ans hypothetical ptotem CO IGC 5 

PtsH from the prohjiyote Anabaena which contains a zinc -finger motif hi -te iniin.il 1 1 the f-'H& domain 
An ORF from the b-.i;:teiium Stier.tnmy:es *ni:h is on the •jpposit^ strund <'f th^ pmtnn hnase pKsal overlapping 
*s the ORF of the kinase. 

[ 1] Hofmann K.O.. BucherP, Trends Biochem Sci.. 20:347-349(1995). 

[-3 etcne J W Colltnge M A Smitn R D Horn W A Walker J C Science ,6c 79J-7u5{19t'4i 
[ 33 Navas T A , Zhou 7. , Flledge o J Cell bO 29-?9 t N951 

[1933] 797 A!d_Xan_dh_C 

Aldehyde oxidase and \antnine dehydrogenase C terminus 

[1934] [I] Roinao Ml Aichei M Mouia I Mout=t J I LeGall J Engh R Schneidei M Hof P Hubet R Midline 
W23C8 "Crystal structure of ih.* vanthine OAidas-Mulated aldehyde uvido-r-Kinoi ise from D gig is Soi^ncv 19«5 
25 270:1170-1178. 

Number of members: 54 

[1935] 798. G!yco_hydro_38 
30 Glycosyi hydrolases family 38 

[1936] Glycosyi hydrolases are key enzymes of carbohydrate metabolism. 

Number of members: 20 

35 [1937] [1] Henrissat B; Medline: 98313424; Giycosidase families" Biochem Soc Trans 1998;28:153-156. 
[1938] 799. HECT 
HECT-domain (ufaiquittn-transferase). 

[1939] The name HECT comes from Homologous to the E6-AP Carboxyl Terminus. 

40 Number of members: 43 

[1940] [1] Huibregtse JM, Scheffner M, Beaudenon S, Howley PM; Medline: 95223981; A Family of proteins struc- 
turally and functionally related to the E6-AP ubiquiiin-protem iigase." Proc Nat! Acad Sci U S A 1995;92:2563-2567. 
[1944] 800. HRDC 
4S HRDC domain 

[1942] The HRDC (Heiicase and RNase D C-terminal) domain has a putative roie in nucleic acid binding. Mutations 
in the HRDC domain cause human disease 

Number of members; 19 

[1943] j 1 ] Morozov V, Mushegian AR, Koontn EV. ftork P: Medline: 98060076; A putative nucleic acid -binding domain 
in Bloom's and Werner's syndrome heiicases" Trends Biochem Sci 1997.22.41 7-418 
[19443 801. Integrase 

[1945] integrase mediates integration of a DMA copy of the viral genome into the host chromosome, integrase is 
55 composed of three domains. The ammo-terminal domain is a zinc binding domain The central domain is the catalytic 
domain [1].The carboxyl terminal domain is a DNA binding domain [2]. 
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Number of members: 581 
[194SJ 

s [1] Dyda F Hfckman AB. Jenkins TM. Engeiman A, Craigte R, Davies DR: Medline; 950993^2. Crystal structure 

tf the (\3t3Mn. domain of HIV 1 integr.jse stmtlantv to othei pt iynucleotidvi tiansfetases " Sconce 1P94 266 
1981-1988. 

[2] Locfi PJ. Ernst JA. huszewski J. Hickman AB, Engelman A. Craigie R. Ciore GM. Gronenborn AM; Medhne: 
«£• 36& 1 4 ? Solution structure of the DNA binding domain of Hiv- 1 Integra?*- " Biochemistry 19^5,34 9826-9833 

10 

[1947] 802. iig_chan 
ligand-gated ion channel 

[1948] This famil> includes the fout tianstrvmhMne legions of the lonotrt pic qhitamate tpcppkii rind NMDA iPiPf- 
tors. 

Number of members: 128 

[1949] [1] long G, Shepherd 0. Jahr C£. Medhne. 95184014. Synaptic desensifciation of NMDA receptors by cai- 
cineurin." Science 1995:267' 151 0-1 51 2 
20 [1950] 803 R ho GAP 
RhoGAP domain 

[1951] GTPase activator proteins towards Rho-'RacyCdc42-like smaii GTPases 

Number of members: 97 

[1952] 

[1] Musdcohio A. C3ni!ey LC. H<jrrts,on SC. Medhne 97121392: Crystal structure of the breakpoint cluster region- 
homology domain from phosphoinositide 3-kinase p85 alpha subunit." Proc Natl Acad Sci USA 1996.93 
30 14373-143/8. 

[2] Barrett T. Xiao B ; Dodson £ J Dodson G. Ludbraok SB. Nurmahomed K. Gambhn SJ, Musacchio A, Smerdon 
SJ, Eccleston JF: Medline; 97182209: The structure of the GTPase-activatmg domain from pSOrhoGAR" Nature 
1997:385:458-461. 

[3j Rittinger K, Walker PA, Eccleston Jf-\ Nurmahomed K, Owen D, l.aue £, Gambhn SJ, Smerdon SJ, Medline 
35 97404320, Crystal structure of a small G protein in complex Vvtth the GTPase-aoiivating protein rhoGAP" Nature 

1997;388:693-697, 

[4] Boguski MS, McCormiek F, Medline 94081948, Proteins regulating Res and its relatives" Nature 1993,368 
643-654. 

40 [1953] 804 vwd 

von WHIsbrand factor type D domain 

[1954] [1] Bori* P, Medline. 93327926, The modulat atchitectute of a new family of growth regulators telated to 
connective tissue growth factor." I'-I-BS lett 1993,327 125-130 

45 Number of members: 92 

[1955] 805 zf-C4 'I'opoisom 
Topoisomerase DNA binding C4 zinc finger 

so [1] Tse-Dmh YC, Beran-Steed RK; Medline: 89034032; Escherichia coli DNA topoisomerase I is a zinc metallo- 

protein with three repetitive zinc-binding domains." J Biol Chem 1988;263:15857 -1 5859. 

(2j Ahumada A. Tse-Dmh YC, Medline 99011409; The Zn( II} binding motifs of £ coli DNA topoisomerase I is part 
of a high-affinity DNA binding domain." Biochem Siophys Res Commun 1998;251:509-514. 

ss Number of members: 51 

[1956} 806. AIRC 
AIR carboxylase 
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Members of this family catalyse the decarboxylation of 1-{5-phosphoribosy!)-5-amfno-4-imida20ie-carboxylate (AIR), 
This family catalyse the sixth step of de novo purine biosynthesis Some members of this family contain two copies of 
this domain. Number of members: 35 
[1957] 807. B re mo do ma in signature and profile 
s PROS ITS cross-reference* s): PS00633; BROMODOMAINJ, PS50014; 
BROMODOMAIN_2 

The bromodomain [1 ,2,3] is a conserved region of about 70 amino acids found in the following proteins: 

Higher eukaryotes transcription initiation factor TRIO 250 Kd subunit (TBP-associsted factor p:?50) (gene CCG1 } 
10 P250 associated with the TFliD TATA-bo< binding protein and stems essential for progression of the G1 phase 

of the cell cycle. 

Human RIMG3, a protein of unknown function encoded in the MHC class If locus. 

Mammalian CREB-bmding protein (CBPt, which mediates oAMP-gene regulation by binding specifically to phos- 
phorylated CREB protein. 

»5 - Drosophila female sterile homeotic protein (gene fsh), required maternally for proper expression of other homeotic 
genes involved in pattern formation, such as Ubx, 

Drosophiia b rah ma protein {gene brm), a protein required for the activation of multiple homeotic genes 
Mammalian homologs of brahma. In human, three brahma-iike proteins are known: SNF2a(hBRM), SNF2b, and 
BRG1. 

SO - Human BS69, a piotem that binds to adenovirus El A and inhibits E1A transactivation - Human peregrin (or Br140) 
Yeast BDF1 [3]. a transcription factor in volved in the expression of a broad class of genes including snRNAs 
Yeast GCN5. a general transcriptional activator operating in concert with certain other DNA-btnding transcriptional 
activators, such as GCN4. HAP2/3M or ADA2. 
- Yeast NPS1/STH1 , involved in G(2) phase control in mitosis. 

25 - Yeast SNF2/SWI2, which is part of a complex with the SNF5, SNF6, SVVI3 and ADR6/SWT1 proteins. This SWi- 
complex is involved in transcriptional activation. 

Yeast SPT7, a transcriptional activator of Ty elements, and possibly other genes. 
Caenorhabdrfe elegans protein cbp-1. 
Yeast hypothetical protein YGRGSSw. 
30 - Yeast hypothetical protein YKROOSw. 
Yeast hypothetical protein L9638 1 , 

[1958] Some proteins contain a tegion which, while similar to some extent to a classical bromodomain. diverges from 
rt by either lacking part of the domain or because of an insertion. These proteins are 

35 

Mammalian protein HR>' (also known as AIM or Mi.L), a protein involved m translocations leading to acute leuke- 
rriias and which possibly acts as a transcriptional regulatory factor. HRX contains a region similar to the C- terminal 
half of the bromodomain. 

Caenorhabditis elegans hypothetical protein ZK783 4 The bromodomain of this protein has a 23 amino-acid m- 
40 sertion. 

Yeast protein YTA7. This protein contains a region with significant similarity to the C-termtnal half of the bromodo- 
main. As it is a member of the .AAA family (see <PDOC0Q572>) it is also in a functionally different: context. 

[1959] The above proteins generally contain a single bromodomain, but some of them contain two copies, this is the 
45 case of BDF1 . CCG1 . fsh. RIIMG.3. Yt< R0D8w and L9638. 1 . 

[1960] The exact function of this domain is notyet known but it is thought to be involved in protein-protein interactions 

and it may be important for the assembly or activity of inulticomponent complexes involved in transcriptional activation. 

[1961] The consensus pattern that has been developed spans a major part of the bromodomain: a more sensitive 

detection is available through the use of a profile which spans the whole domain, 
so Consensus pattern[STANVF^x(2)-F-x(4HDNS]-x(5,7H^ 

{6,8)-Y"X{T2 : 13HLIVM]^(2)--N--[SACF]"X{2HFY3 

References 
55 [1962] 

[ 1] Haynes S.R., Doolard C, Winston F : Beck S„ Trowsdale J., Dawid LB. Nucleic Acids Res. 20:2693-2603 

(1992). 
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[2] Tamkun J.W., Deunng R.. Scott M.R, Kissinger M„ Pattatuccf A.M., Kaufman T.C., Kennison J.A. Cell 88: 
561-572(1992). 

[ 3] Tamkun J W Curt. Optn. Genet Dev 5 473-477(1995). 

s [1963j 8 °8 (CHj Actmimtype actin-bmding domain signatures 

PROS ITt! cross=- reference/ si: PS00019. AC TIN IN J, PS00020. ACTININ.2 

[1964] Alpha-acimm is a F -actin cross-linking protein which is thought to anchoractin to a variety of intracellular 
structures J1j The actin-bmding domain of alpha-actinm seems to reside in the first 250 residues of the protein A 
similar actin -binding domain has been found in the N -terminal region of many different actin-binding proteins [2,3): 

10 

In the beta chain of spectrin {or fodrin). 

In dystrophin, the protein defective in Duchenne muscuiar dystrophy (DMD) and which may play a role in anchoring 
the cy to skeleton to the plasma membrane 
In the slime mold geiation factor (or ABP-120). 
*s - In actin-binding protein A6P-2SG (or filamin), a protein that link actin filaments to membrane giycoproteins. 

In fimbrin (or plastin}, an actin-bundiing protein. Fimbrin differs from the above proteins in that it contains two 
tandem copies of the actin- binding domain and that these copies are located in the C-termina! part of the protein 

[1965] Two conserved regions were selected as signature patterns for this type of main. The first of this region is 
so located at the beginning of the- domain, htlethe second one is located m the centra! section and has been shown to 
be essential for the binding of actin. 

[1966] Consensus pattern[£Q)~x(2HATVHFY]-x{2>~W~x~N 

Consensus pattern[L!VM]-x- [SGNHL!VM]-[DAGHE]-[SAG3-x-[DNEAGHLIVM]-x--[DEAG3-x(4)-[LlVM]-x-[LM3-[SAG3- 
[L!VMHLlVMT]-W-x~ (LiVM]{2) 

[ 1j Schleicher M.. Andre E, Harmann A.. Noegel A.A. Dev Genet. 9.521-530(1988). 
[23 Matsudaira P Trends Bioohem Sci. 16:87-92(1991 ). 
[ 3] Dubreuil R R BioEssays 13:219-226(1991). 

30 [1967] 809 (COX1) Herne-copper oxidase subunit I, eoppet B binding region signature PROSITE eross-teferenoe 
fs): PS00077: COX1 

Heme-copper respiratory oxidases p] are oligomeric integral membrane protein complexes that catalyze the terminal 
step in the respiratory chain: they transfer electrons from cytochrome c or a quinol to oxygen. Some terminal oxidases 
generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner 
35 membrane (eukaryotes) The enzyme complex consists of 3-4 su bun its (prokar votes) up to 13 polypeptides i mammals) 
of which only the catalytic subunit (equivalent to mammalian subunit 1 (CO I)) is found in all heme-copper respiratory 
o<idases The presence of a bimetallic oentei (formed by a high-spin heme and copper Bj as well as a low-spin heme, 
both ligated to six conserved histidine residues nearthe outer side of fourtransmembrane spans within CO I is common 
to all family members [2-4], 

40 [1968] In contrary to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The 
enzyme complexes vary in heme and copper composition, substrate type and substrate affinity The diffeient respiratory 
oxidases allow the cells to customize their respiratory systems according a variety of environmental growth conditions 

in 

[1969] Recently also a component of an anaerobic respirator/ chain has been found to contain the copper 8 binding 
■*s signature of this family nitric oxide reductase (NOR) exists in denitrifying species of Archae and Eubaclena. 
[1970] Enzymes that belong to this family are: 

Mttochondrial-type cytochrome c oxidase (EC 19 3 1) which uses cytochrome c as electron donor. The electrons 
are transferred via copper A (Cu(Aj) and heme a to the bimetallic center of CO I that is formed by a penta-coor- 
so dinated heme a and copper B (CufB!) Subunit 1 contains 12 transmembrane regions. Cu(B} is said to be ligated 

to three of the conserved histidine residues within the transmembrane segments 6 and 7 

Quinol oxidase from prokaryotes that transfers electrons from a quinol to the bmuclear center of polypeptide I. This 
category of enzymes includes Escherichia coli cytochrome O terminal oxidase complex which is a component of 
the aerobic respiratory chain that predominates when cells are grown at high aeration. 
55 - Fi/N, the catalytic subunit of a cytochrome c oxidase expressed in nitrogen-fixing bacieroids living in root nodules. 

The high affinity for oxygen allows oxidative phosphorylation under low oxygen concentrations A similar enzyme 
has been found in other purple bacteria. 

Nitric oxide reductase (EC 1 7 99 7i from Pseudomonas stut2eri NOR reduces nitrate to dtnitrogen It is a het- 
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erodimer of norC and the catalytic subunit norB. The tatter contains the 8 invariant histidine residues and 12 trans- 
membrane segments [5]. 

[1971] As a signature pattern the copper-binding region was used. 
s 11972} Consensus pattem[YVVGI^IVFYWrA3(2HVGS}-hKLNP]-x-V-x(44,47)-H-H [The three H's are copper 8 lig- 
ands] 

[1973] Notecytochrome bd complexes do not belong to this family; 
E1] 

10 Gareia-Horsman J. A., Barquera 8., Rumbley J., Ma J., Gennts R.B. J. Bacterid. 176:5587-5800(1994). 

E2| 

Castresana J., Luebben M., Saraste M., Higgins D.G. EMSG J. 13:2516-2525(1994). 
[3] 

Capaldi R.A., Malatesta R, Dartey-Usmar V.M. 
»5 Biochim Biophys. Acta 728:135-148(1983). 

[4] 

Holm L, Saraste M . VVikstrom M 
EMBO J. 6:2819-2823(1987). 

m 

so Saraste M . , C a stre s a n a J . 

FEBS Lett. 341:1-4(1994). 



[1974J 810. <dehydrog_molyb) Eukaryotic molybdoptenn oxidoreductases signature PROSITE cross-reference's) 
PSC0S59, MOLYBDOPTER!N_EUK 

(19753 A number of different eukaryotrc oxidoreductases that require and bind a molybdoptertn cofactor have been 
shown [1] to share a few regions of sequence similarity These enzymes are: 

Xanthine dehydrogenase (EC 1.1. 1 .204 1 : which catalyzes the oxidation of xanthine to Line acid with the concomitant 
reduction of NAD. Structurally, this enzyme of about 1300 ammo acids consists of at least three distinct domains; 
an N-terminal 2Pe-2S ferredoxin-like if on-sulfur binding domain fs.ee <PDOC001 / 5> i, a central FAD/NAD-binding 
domain and a C-termmaf Mo-ptenn domain. 

Aldehyde oxidase (EC 1.2.3.1). which catalyzes the oxidation aldehydes into acids. Aldehyde oxidase is highly 
similar to xanthine dehydrogenase in its sequence and domain structure. 

Nitrate reductase (EC 1 6 6 1 ). which catalyzes the reduction of nitrate to nitrite. Structurally, this enzyme of about 
900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome b5-typc heme-binding domain 
(see <PDOO00170>) and a C ■■terminal FAD/NAD-binding cytochrome reductase domain 

Sulfite o>idase (EC 1 8 3 1 ). which catalyzes the oxidation of sulfite to sulfate. Structurally this enzyme of about 
460 3tnino acids consists of an N-termtnal cytochrome b5-binding domain followed by a Mo-pterin domain 



40 [1976] There are a few conserved regions in the sequence of the molybdopterin-bmding domain of these enzymes 
The pattern uses to detect these proteins is based on one of thern It contains a cysteine residue which could be 
involved in binding the molybdopterin cofactor, 
[19773 Consensus pattern(GA]-y(3HKRNaHT^ 
fl] 

45 Wootton J.C.. Nicolson R.E., Cock J.M., Waiters D.E., Burke J.F., Doyle 
W.A.. Bray R.C. 

Biochim, Biophys. Acta 1057:157-185(1891), 

811. (DNAJigase) ATP-dependent DMA ligase signatures 

PROSITE cross-reference(s): PS00697. DNAJJGASE_A1, PS00333. DNA_LIGAS£_A2 
so [19783 DNA ligase (polydeoxy ribonucleotide synthase) is the enzyme that joins two DMA fragments by catalyzing 
the formation of an intemucieotide ester bond between phosphate and deoxyribose. It is active during DNA replication. 
DNA repair and DNA recombination. There are two forms of DNA ligase one requires ATP (EC 6 5 1 1 ). the other NAD 
(EC 8.5.1.2}. 

[1979] O-ukaryotic, a rchae bacteria I. virus and phage DNA ligases are ATP-dependent During the first step of the 
55 joining reaction, the ligase interacts with ATP to form a covaient enzyme-adenyiate intermediate. A conserved lysine 
residue is the site of adenylation [1,2], 

[1980] Apart from the active site region, the only conserved region common to all ATP-dependent DNA ligases is 
found [3] in the C-terminal section and contains a conserved glutarnate as well as four positions with conserved basic 
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residues. 

[198!] bignatuie pattern*, vieio d<=velop<=d fu both con:>ei<*ed regions 
[1982] uon^ensus ( attem[ED> jH]-<-K-*-[E>N]-^-<-R-[G^UVM] [K is the active nit residue ] 
[1983] Consensus patt-?rnE-G-(LI\'MAl-|LlVM]{2)-[KR1-x[? 8HY^V3-[QNEK]-<{2,6V[KRH]-<{3,5Vh-[L!VMFY]-K 
s Sequences kno^n to belong to this class detected by the catternALL evcept for archcbacterial DIM A ligascs 

Ml 

Tomkinson A.E.. Totty N.F., Ginsburg M. lindah! T. 
Pick Natl Acad bot UbA 88 400-4fMO&«4l t 

w { 2) 

Lindah I T., Barnes D.E. 

Annu. Rev. Biochem. 61:251-281(1992). 

[3] 

Ktetzin A. 

»5 Nucleic Acids Res. 20 5389-5396(1992). 

[1984] 8 (f-'AD..GI\,3P..dh) FAD dependent giy^etol-3-r. hosphate dehydrogenase signatures PF^OSI'lt- ctoss ref 
erencetsl PS00T" FAD_G3PDH_i P30oy78 FAO__G3PDH_2 

[19S5] FAD-dept-ndentglvo^ol 3 phosphate dehydioyenase (E>~ I 1 ^JP 5) (GPD) catal>:res the concision of cjlyc- 
20 eiol- Vphosphate ink dihydionyaciMont; phosphate In ba^tena [ !]H is associated with the- utilisation of glycerol -xupled 
to lespnati'jn In Es:henchia coli ko isozymes are kno^n ?ne e>piessed unoei anaeiofci; conditions (gen^ glp-^j 
and one in aetobic conditions (gene gleDj In eukaryotes a mitochondrial form of GPD participates in the glv'cetol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1.1.1.8) [2, 3], 
[1986] These t-n^ymtiS are proteins uf about 60 to "0 Kd i vhieh contain a prob ibie FAD-bindmg domain in their U- 
25 terminal extremity. The mammalian enzyme differs from the bactena! or yeast proteins by having an EF-hand calcium- 
binding legion tSee <PDOCu00t>>; m its G-teimma! e\tieiriity 

[1987] TVvt' signature patterns v»,ete developed One t as-ed on the first half of the FAD-binding domain and c ne which 
cormsponds to a centred region in th> j central part ot these engines 
[1988] Consensus pattem[l\, ]-G-G-G- a (2VG-{STACV3-G-a-A-^-D-m3]-R-G 
30 Consensus ( an>mG-G-K->.2)-[<:-,S ! b]-Y-R-*(2)-A 

[13 

Austin D.. Larson T.J. 

J. Bacterial. 173:101-107(1991). 

35 [2] 

Roennow B,, Kielland-Brandt M.C. 

Yeast 9:1121-1130(19931 

[33 

Brown L.J.. McDonald M.J.. Lenn D.A.. Moran SM. 
40 J. Biol. Chem. 269:14363-14366(1994). 

[1989] ^13 {Fapy_DNA_glyco) FonTiamioor. ynmninH-DNA glycosylase sign-jtnte PROS1TE ctoss-ieference(s) 
PS01242: FPG 

[1990} Formrtmidtpvnmidine-DNA gjvcosylase tF.C 31 2 23i [1] iF-^pv-DNA gjycosylase) t^ene tpo} is a bai.ten.jl 
45 .*nzym> j invoK-Ki m DNA repair 3nd v\htch t>ubf oudi::ed pi nine bases to release 2 e^-^mino^-hydioYy-SN-mefhyl- 
rormamidopynmidine (Fapy) and 7 8-dihydro-8-oxoguanine (8-uxoG) residues. In addition to its glycosylase activity. 
FPG oan also niok DNA at apuiimc/apvrtmtdinio sites (Ap sites) FPG is a monomerR protein ot about 32 Kd which 
binds and require zinc tor its activity. 

[1991] Tne binoing site for zinc seems to be located in the C -terminal part of the enz) me where fours conser\ed and 
so es^ntial [2] cysteine^ are located a -,i^natun= j. attain v\as developed based on this legion 

[1992} C onsensus patternC 4)-C- v(GTAOJ < (IVJ-m / >-R-lGSTANH&'!A|- <-[F-Mj-C- <(2VC-0 
[I he four C's are putative zinc ligands] 



55 Duwat P.. de Qliveira R., Ehrlich S.D., Boiteux S. 

Microbiology 141:411-417(1 995). 
[2] 

O'Connor T.E., Graves R.J., Demurcia G., Casta ing B., Laval J. 
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J. Biol. Chem. 268:9083-9070(1993). 



[1993] 814 (G_glujiwispept} Gamma-giutamy transpeptidase signature PROSITE crms-referenceis i: PS00462, 
G_GL U_T R A N S P E PT ! D AS E 

s [1994j Gamma-glutamyltranspeptrdase t'EC 2.3.2.2) (GGT; [1] catalyzes the transfer of the gamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water {forming glutamatei. GGT plays a hey role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradation of glutathione In prokaryotes and eukary- 
otes, it is an enzyme that consists of two polypeptide chains, a heavy and a light subuntt. processed from a single 
chain precursor The active site of GGT is known to be located in the light subunit 

to [1995] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2] Pseu- 
domonas cephalosporin acylases {EC 3.5.1.-) that convert 7-beta-(4-carboxybutanamido)-cepha!osporanic acid (GL- 
7ACA) into 7-aminocephaiosporanic acid (7ACA) and glutaric acid are evolutionary related to GGT and also show 
some GGT activity [3]. Like GGT, these GL-MCA acylases, are also composed of two subunits. 
[1996] One of the conserved regions correspond to the N-terminal extremity of the mature light chains of these 

*s enzymes. This region was used as a signature pattern. 

[1997] Consensus patternT-[STA]-H-x-[ST]-[L!VMA]-x{4)-^ 



[1] 

Tate S.S., Meister A, 

Meth. Enzymol. 113:400-419(1985). 

[2] 

Suzuki H., Kumagai H.. Echigo 1, Tochfkura T. 

J. Bacterial. 171:5169-5172(1989). 

[3] 

Ishiye M , Niwa M. 

Btochim trophy*. Acta 1 1 3.? ,?3 3 -2'^ 1 S^,; t 



[1998] 815 G-ptoiein gamma subunit profile 

PROeiTE cross-reference^ PS50058 G_PROT£ I N_GAM MA 

30 [1999] Guanine nut ieotid^-l: indmg proteins iG piotems) [1 } act as intermediaries in the transduction of signals gen- 
erated Pyttansmeinbiane receptors G proteins consist of three subunits {alpha, beta, and gamma) The alpha subunit 
binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but they seem to be required 
for th<= tepla^f m^nt of GOP by GTP as w^ll at. foi membrane anchoring and receptor recognition 
[2000] The gamma subunits .tie small piotems tjiom 70 to 110 residues) that are bound to the membrane via a 

35 isoprenyl group either a f irntsyi or a gtian^ (Vpranyit eovalenily linked to their C-ter minus. In mammals there are af 
least 12 different isofonns of gamma subunits. 

[2001] The Caenorhabditis elegans protein egl-10, which is a re gulak t of G-protein signalling, contains a G-protein 
gamma-like domain. 

[2002] A profile was developed that spans the complete length of the gamma subunit. 

40 [1] 

Pennington S.R. 
Protein Prof. 2:18-315(1995), 
[2003] & 1 6. GNS1/SUR4 family signature 
PROSIT E; cross-reference{s). PS01188, GNS1..SUR4 
■*s [2004] The following group of eukaryotic integral membrane proteins, whose exact function has not yef clearly been 
established, are evolutionary related [1]: 

Yeast GMS1 [2], a protein involved in synthesis of 1,3-beta-glucan. 

Yeast SUR4 (or APA1, SRE i ) [3], a protein that could act in a glucose-signaling pathway that controls the espres- 
so sion of several genes that are transcriptionally regulated by glucose. 
Yeast hypothetical protein YJ!..198c. 
Caenorhabditis elegans hypothetical protein C40H1.4. 
Caenorhabditis elegans hypothetical protein D2024.3. 

55 [2005] The proteins have from 290 to 435 amino acid residues. Structurally, they seem to be formed of three sections: 
a N-termmai region wrth two transmembrane domains, a central hydrophilic loop and a C-termmai region that contains 
from one to three transmembrane domains. A conserved region that contains three histtdmes was selected as a sig- 
nature pattern. This region is located in the hydrophilic loop. 
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Consensus patternL-x-F-L-H-x-Y-H-H 



[13 

Bairoch A. 

s Unpublished observations (1998). 

E2] 

Ei-Sherbeini M. t Clemas J.A. 

J. Bacteriol. 177:3227-3234(1995). 

[33 

10 Gareia-Arran:: M . Maldonado A M., tVlazon M J , Poftilio F. 

J. Biol. Chem. 269' 1 6076-1 80S2i 1994) 



[2006] SI 7 Immunoglobulins and major histocompatibility complex proteins signature PROSIT!-: cioss-referenc.e(s): 
PS00290; IG„MHC 

»5 [2007] The basic structure of immunoglobulin (lojjt) molecules is a tetrameroftwo light chains and two heavy chains 
linked by disulfide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain 
(CL) and 3 variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon, gamma and mu, all 
consisting of a variable domain (VH) and three {in alpha, delta and gamma) or four (in epsilon and mu) constant domains 
(CH1 to CH4). 

so [2008] The major histocompatibility complex (MHC) molecules are made of two chains In class I [2] the alpha chain 
ts composed of three extracellular domains, a transmembrane region and a cytoplasmic, tail The beta chain (beta- 
2-microglobulin) is composed of a single extracellular domain. In class II [3], both the alpha and the beta chains are 
composed of two extracellular domains, a transmembrane region and a cytoplasmic tail. 

[2009] it is known [4,5] that the Ig constant chain domains and a single extracellular domain in each type of MHC 
25 chains are related. These homologous domains are approximately one hundred amino acids long and include a con- 
served intradomain disulfide bond A small pattern around the G-terminal cysteine is involved in this disulfide bond 
which can be used to detect these category of Ig related proteins 

[2010] Consensus paliem[FY]-x-C-x-[vA]-x-H-Sequences known to belong to this class detected by the pattern' Ig 
heavy chains type Alpha C region ■ All, in CH2 and CH3 Ig heavy chains type Delta C region ■ Ail, in CH3 Ig heavy 
30 chains type- Epsilon C region: All. in CH1 . CH3 and CH4 Ig heavy chains type- Gamma C region ' All. in CH3 and also 
CH1 in some cases Ig heavy chains type Mu C region Ail. in CH2, CHS and CH4. Ig light chains type Kappa C region 
In ail CL except rabbit and Xenopus Ig light chains type Lambda C region : In ail CL except rabbit MHC class I alpha 
chains : 

All. in aipha-3 domains, including in the cytomegalovirus MHC-1 homologous protein [6], Beta-2-microglobulin : All. 
35 MHC class i! alpha chains: All, in alpha-2 domains, MHC class II beta chains: All, in beta-2 domains. 



Gough N. 

Trends Biochem. Sci. 6.203-205(1981). 
[2] 

Klein J., Figueroa F. 

Immunol. Today 7:41-44(1986). 

[3] 

Figueroa F, Klein J. 
Immunol. Today 7:78-81(1986), 

Orr H.T., Lancet D, ; Robb R.J,, Lopez de Castro J.A,, Stroininger J.L. 

Nature 282 '266-270(1 979). 

[51 

Cushley W, Owen M.J. 
Immunol. Today 4:88-92(1983) 
[83 

Beck S., Barrel B.G. 
Nature 331:289-272(1988). 

[2011] 816 tlGFEPj insulin-like growth factor binding pioteins signature PROSITE cioss-referencetsj 
!GF„B!ND!NG 

[2012] The insulin-lik^ qtowth factors < IGF-I and IGF-ll) bind to specific binding ptottins in edtacellular fluids, with 
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high affinity {1 Z 3] These IGF-binding ptotetns ilGFBPi piolongthe half-life of the IGFs and base been shown to either 
tnhtbtt oi stimulate thf growth MO'^'tmct effects of the IGFs on tells, cultuie Th<=> s^einto =i!tei the intei action of IGFs 
v\ith their cell surface receptors Ther-i at* at It^t si> difterent KjFBPs and fhe> are slructuially tested 
[2013] The following gtowth-faitor imu:tbk* proteins, are ssttucturafiv related te IGFBPs and toulo titration us gto*ih- 
s factor binding proteins (4.5]: 

Mous- 1 protein eyr^l and its prob ihie chicken homuiog protein CEF-10 

Human connective tissue growth factor (CTGF) and its mouse homolog. protein F13P-12. 

Vertebrate protein NOV. 

10 

[2014] 4s a Sjignatuiii pattern a consur^e-d oysteinu-nih re-giori loeatedin the N-terminal action ot the-se- proteins is 
used. 

[2015] Consensus p,jttemG-C-(GS]-0-0- <(.?)-C-A-m6i< 

Sequences known t? belong to this class detected by the pattern ALL ^<.ier.t f?r IGFBP-6 s 

in 

Rechler M M. 

Vitam. Norm. 47:1-114(1993). 
[2] 

so Shimasaki S., Ling N. 

Prog Growth Factor Res. 3:243-266(1991} 
[3] 

Clemmons D.R. 

Trends Endocrinol. Metab. 1 412-417; 1990) 
25 14) 

Bradham D U . Igarashi A.. Potter R I . Grotendorst G R. 

J. Cell Biol. 114:1285-1294(1991). 

[5j 

Maloise! V., Martinerie C, Dambhne G.. Piassiart G., Brisac M., Crochet 
30 J., Perbai B. 

Mol. Cell. Biol. 12:10-21(1992). 



[2016] SI 9 L.MWPc- Low molecular weight phosphotyrosine protein phosphatase 
35 Nu mber of members : 34 

[2017] jlJMedline: 94329182. T he crystal structure of a low-moiecular-weight phosphotyrosine protein phosphatase, 

bu Y D Taddei N Rtefani M Ramponi G Noidlund P Natural 944 370 57 5-5 v 8 

[2018] 820 imyosin_head) ATF GTF-bmoing site motif A (P-loop) 

PROSITE cross-reference(s): PS00017; ATP_GTP_A 
■to [2019] From s<=quf nc* 1 _x mpausont. and ciystalk ^raphk data analysis, it has been shown [ I 2 3 4 5 t } that an aj. - 

pieaabte rioportion cf proteins fh : it bind A!P or GIP shan* : i numbs i oi mote or less conserved ".equt-nw motifs 

Tne best c^nser^ed :>f these rnotits is a ghcme-nch tegion which typi:a!lv forms a flexible loop between a beta-strand 

and an aipha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 

Generally refeired to 3i the 'A' consensus sequence j1 j or the 'P-ioop' \b\ 
■*5 [2020] There *ire numerous ^TP- or GTP-hindmg proteins in which the P-loop is found A number of protein families 

for which the relevance of the presence of such motif has been noted is listed below: 

- ATP s> nthase alpha and beta subunits t*kt> -.FCOCOOkv^ i 
Myosin heavy chains. 

so - Kmesin heavy chains and kinesin-like proteins (see <PDOC0034^>). 
Dvnamins and dynamin-like proteins (see <PDOC00362>). 
Guanylate kinase (see <PDGC006?0- - '). 

- Thymidine kinase (see <PDOC00524>). 

- i'iiymidylate kinase i^see <-PIXX 01034-M 
ss . ShiKim lie kin ise isee ^FDOC00868--1 

Nittogenase n on ptotetn family (ntfHfp'Ci (see *'PPOC005S0-'j 

- ATP-L indtn.) proteins, involved in 'arti\ e tianpporf (ABC transporter* > (7] (see <POOC00185>). 

- CiNA and RNA hsiicssts [8 9 10] 
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GTP-bmding elongation factors (EF-Tu, EF-1 alpha. EF-G, EF-2. etc), 

- R^sfamtK <.f GTP-btndinp pioteins (R<5*> Rho R=ib Ral Ypt1 i>ECA et^ i 

- Nuok ar ptotein ran < st-^ *.FCOC00859> i 

- ^DP-nbosylatnn factors family is.ee *-'PDOC00781--i 
s - Bacterial dnaA protein <sce- ~PDOC00771-» 

- Baaerwi tec A t totem tsve -PDOCOOt 31 >) 

- E ictenai wiF protein f^t ^PDOCOO^o ^ 

Guanine nucleotide-binding proteins alpha subumts (Gi Gs Gt GO etc > 
DNA mismatch repair ptctem? muto family (See <PD(X U0388>; 
10 - Back tial typ* II se onifion s> ".tern protein E -;PDOf" 005fl7>1 

[2021] Not all ATP- oi GTP-bindimj proteins ate picked-up bv this motif A nuinbei ot pioteins escape detection 
beci[Js.eth<r structutet f then A! F binding site is. compMdy diffetent from that of the F lot p F>amples', of s.uch proteins 
ate the E1-E2 ATPases or the Glycolytic hnases In father ATP- ot GTP-biming pioteins the fle>iole loop exists in a 
»5 slightly different form; this is the case for tubulins or protein kinases. A special mention must be reserved for adenylate 
kinase, in which there is a single deviation from the P-loop pattern: tn the last position Glv is found instead of SerorThr. 
[2022] C onsens.us. pattern! AG] m4}-G M&T] 

[11 

20 Walker J E b=naste M Run&wtc k M J Gay N J 

EMBO J. 1:946-951 f 19821 
[2] 

Moller vV... Amons R. 
FEBS Lett 186 1-"( 1«855 

Fry D.C.. Ruby S.A . Mildvan A.S. 

Proc Nat! Acad Su USA 83 9P - -P I1( 1986) 

[4J 

Dever T.E.. Glvnias M.J.. Merrick W,C. 
30 Pt^- Natl Acad ->oi U -> A 84 1814-1818), 1W) 

[5] 

Saraste M.« Sibbald P.R., Wittmghofer A. 
Tiendb Biochfm Sci If 430-4o-!( 10fcV0> 
E6] 

35 Koonin E. V. 

J. Mol, Biol. 229: 11 65-11 74(1 983). 
E7] 

Higgins C.F.. Hyde S.C.. Mimmack M.M.. Giieadi U.. Gi!! D.R.. Gallagher M.R 
J. Bioenerg. Biomembr. 22:571-592(1990). 

40 [8] 

Hodgman l.C 

Natuie 333 22-25(1988) und Nature 33"i ?" 7 8-57 8( 1 988' (EnataS 
[93 

Linder P, Lasko P . Ashburner M.. Leroy P., Nielsen P.J,. Nishi K.. 
4S Schnier J.. Slonimski P.P. 

Nature 337:121-122(19891 
[10] 

Goft alt-nya A E Koonin E V Conehenk, A F Biinc i'VM 
Nucleic Acids Res V 4~13-4~3G{ 1935) 

[2023] 821. PE: PE family 

This famtlv named after a PE motif near to the amino terminus of the domain. The PE family of proteins all contain an 
ammo-terminal region of about 110 ammo acids. The carboxyl terminus of this family are variable and fal! into several 
classes 'I he largest <.las.s of PE ptoteins is the highly repetitive PCPS clas? which have a high gi>cine content. The 
55 function of thtst; proteins is uncertain but it has b^t?n sugcit^ttjd that tht*y mav t-i* related to antigenic variation of 
Mycobacterium tuberculosis [13- Number of members: 88 

[2024] ft] Wtedline ^82^987 Deciphering the Lioloy^ of IViycoh=rtenum tubei -uIoh-> fiomthe complete genome 
■sequence Colt ST Breach R Parkhill J Gatm^i T churchy Hams 0 Gordon SV Eig.irm.-tiW K Gas S, Barry CE 
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3rd. Tekaia F, Badcock K< Basham D. Brown D, Chiliingworth T, Connor R, Davies R, Devlin K, Feitwell I Gentles S, 
Hamlin H, Holroyd S, Hornsby I Jagels K. Barrel! BG, et a!; Nature 1998:393:537-544. 
[2025] 822 (RNB) Ribonuclease II family signature 
PROSITE cross-reference(s) PS01175, RIBONUCLEA3EJ! 
s [2026] On the- basis of sequence similarities, the following bacteria! and eukaryotic proteins seem to form a family 

- Escherichia colt and related bacteria ribonuclease II (EC 3.1.13,1) (RNase II) {gene mb) [i|. RNase !! is an exo- 
nuciease involved in mRNA decay. It degrades mRNA by hydroiyzmg single-stranded polyribonucleotides proces- 
sively in the 3' to 5' direction. 

to - Bacterial protein vacB In Shigella flexnen, vacB has been shown to be required for the e<piess.ion of virulence 
genes at the posttranscriptionai level. 

- Yeast protein SSD1 (or SRK1 } which is implicated in the control of the cell cycle G1 phase. 

Yeast protein DIS3 [2], which binds to ran (GSP'l ) and ehanoes the the nuclectide-releasing activity of RCC1 on ran. 
Fission yeast protein dis3. which is implicated in mitotic control. 
*5 - Neurospora crassa cyt-4. a mitochondrial protein required for RNA 5' and 3' end processing and splicing 
Yeast protein MSU1. which is involved in mitochondria! biogenesis. 

Synechocystis strain PCC 6803 protein zam [3[, which control resistance to the carbonic anhydrase inhibitor aceta- 
zolamide 

Caenorhabditis elegans hypothetical protein F48E8.6. 

[2027] The size of these proteins range from 644 lestdues nnoito 1210 ^SSD1 ^ Wni!^ then sequen:e is hignly 
divergent they share a conserved domain in their C-terminal section [4] It is possible that this oomain plays a lole in 
a putative eyonuclease function that would be common to all thebe piot^ins A signature pattern v as dewloj. <;d L a^d 
on the core of this conserved domain. 
25 [20283 Consensus pattem(Hll-{FYEHGSTAM|-[LIVM}->((4 > 5VY-ESTAL)-v-EF\WACHTVFES&j-P-{L!VMAHRQHKRF 
(FY}-y..|>.y ( 3 ) ..EHQ3 



Ml 

Ziinao R Camelo L AiratanoCW. 
30 Mol. Microbiol. 8:43-51(1993). 

[2] 

Noguchi £ Hayashi ti Azuma Y Seki T., Nakamura M., Nakashima N. 
Yanagida M.. He X., Mueller U., Sazer S., Nishimoto T. 
EMSO J 1& W&S-MO&i !9Pf.) 

35 [3] 

Beuf I. Bedu S Cami £-3 Joset F 
Plant Moi. Biol. 27:779-788(19951 
[43 

Mian I.S. 

40 Nucleic Audi. Res ^5 3187-3195^997). 



[2029] ^23 Sic homology Z fSH2i oomain profile 
PROSITE cross-reference(s): PS60001: SH2 

[2030] 'i he or*, homology 2 i SH2} domain is a protein domain of about 100 amino- acid residues first identified as a 
45 coni=ef\ed sequence region between the oncoproteins Src and Fps [1] Similar sequences were later found in many 
other intracellular signal-transducing proteins [2], SH2 domains function as regulatory modules of intracellular signalling 
cascades by interacting with high affinity to phosphotyrostne-conraining target peptides in a sequence-specific and 
stnelly (.hosphorylation-dependent manner [3,4,5,6]. 

[2031] Tne 3H2 oomain has a conserved 3D staicture consisting of two alpha helices and six to seven beta-strands 
so The core of th<= domain is fumed b\ '< continuous beta-meander composed of two connected beta-sheets [7] 
[2032] So tai SH2 domains have been identified in the following proteins 



Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular in 
the Src Abi BK Csk and ZAP "0 families of kinases 
55 - Mammali m phosphatidvlmositol-speeific phosphoiipase C gamma-1 and -2. Two copies of the SH2 domain are 
found in those ptotetns in between the catalytic 'X- and 'Y-boxes'fsee <PDOC50007>) 
Mammalian phosphatidyl inositol 3-kinase regulatory p85 subumt. 
■-"erne virtebrate and invertebrate protein-tyrosine phosphatases. 
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Mammalian Ras GTPase-activatmg protein (GAP). 

Adaptor protein? mediating binding of guanine nucleotide exchange factors to growth factor receptors vertebrate 
GR&2. Caenorhabditis elegans sern-5 and Drosophila DRK. 

Mammalian Vav oncoprotein, a guanine-nucleotide exchange factor of the CDC24 family 

Miscelianous proteins interacting with vertebrate receptor protein tyrosine- kinases, oncoprotein Crk, mammalian 

cytoplasmic proteins Nek. She. 

STAT proteins (signal transducers and activators of transcription) 
Chicken tensm. 

Yeast transcriptional control protein SPT6. 

[2033] The profile developed to detect SH2 domains is based on a structural alignment consisting of S gap-free 
blocks and 7 linker regions totaling 92 match positions 

[1] 

Sadowski L, Stone J.C., Pawson T 
Mol Cell Biol 6 4306-4408(1986! 
[2] 

Russei R B Breed J . Barton G J 
FE8S Lett. 304: 1 5-20( 1 992) . 
[3] 

Marangere L.E.M.. Pawson T. 

J. Cell Sci. Suppi. 18:97-1 04(1994). 

[4] 

Pawson T., Schlessmger J. 
Curr, Bloi. 3:434-442(1993). 
E&] 

Mayer B.J., Baltimore D 
Trends Cell Bio! 3 8-13(1993) 
[6] 

Pawson T. 

Nature 373 573-580(1 995 1 

m 

Kuriyan J., Cowburn D. 

Curt Opin. Struct. Biol. 3:828-837(1993). 

[2034] 824. Sulfate transporters signature 

PROSITE cross-reterence(s) PS01130, SULFATE_TRANSP 

[2035] A number of proteins involved in the transport of sulfate across a membone as well as some yet uncharac- 
terced proteins have been shown [1.2] to be e\/Olut]onaiy related These proteins are 

Neurospora ctassa sulfate permease II (gene oys-14) 
Yeast sulfate permeases (genes SUL1 and SUL2). 

- Rat sulfate anion transporter 1 (SAT-1 ). 

Mammalian DIDST, a probable sulfate transporter which, in Human, is involved in the genetic disease, tiiastrophlc 
dysplasia (DTD). 

Sulfate transporters 1. 2 and 3 from the legume Stylosanthes hamata 

Human pendnn (gene PDS). which is involved in a number of hearing loss genetic diseases. 
Human protein DR. A (Down-Regulated in Adenoma) 

- Soybean early nodulm 70. 

Esc henchia coli hypothetical protein ychM 
Caenorhabditis elegans hypothetical piotetn F41D9 5 

[2036] As expected by their transport function, these proteins are highly hydrophobic and seem to contain about 12 
tr si ns membrane domains The best conserved region seems to be located in the second transmembrane region and 
is used as a signature pattern, 

[2037] Consensus. pattern[PAV]-x-Y-[GS]-!..-V-[STAGK2}-x(4>-[t.tVFYA3-[LtV&T]-[YIH(3}-[GA3-tGST3-S-[KR] 
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[13 

Sandal N.N. , Marcker K.A. 

Trends Biochern. Sci. 19.19-19(1994). 

PI 

s Smith F.W.. Hawkesfard M.J.. Prosser I.M., Clarkson D.T. 

Mo!. Gen. Genet. 247:709-715(1995). 

[2033] 825. TYA: TYA transposon protein 

Ty are yeast transposons A 5 7kb transcript codes for p3 a fusion protein of TYA and TYB The T YA protein is analogous 
10 to the gag protein of retroviruses. TYA a is cleaved to form 48kd protein which can form mature virion like particles [1]. 
Number of members: 59 

[2039] [1] Medline: 97404699. Cryo-electron microscopy structure of yeast Ty retrotransposon virus-like particles. 
Palmer KJ, Ticheiaar W, Myers N, Burns NR. Butcher SJ, Kingsman AJ, Fuller SD, Saibil HR; J Virol 1997;71: 
8883-6868. 
is [20403 826. Aldolase J I 

Class I! Aldolase and Adducin N-termina! domain. 

-!- This family includes class il aldolases and adducirts which have not been ascribed any enzymatic function. Number 
of members: 37 

References: 

[2041] 

25 [1j Medline: 93294819. The spatial structure of the class I! L-fuculose-1-phosphate aldolase from Escherichia coli. 

Dreyer MK. Schulz GF; J Mo! £-3 eo E 1993,231 549-553. 

[2] Medline: 96256522. Catalytic mechanism of the metal-dependent fuculose aldolase from Escherichia coli as 
derived from the structure Dreyer MK : Schulz GE; J Moi Bioi 1996;259:458-466. 

30 [2042] 827. CBD_2 

-!- Two tryptophan residues are involved in cellulose binding. 

4- Cellulose binding domain found in bacteria. Number of members: 51 

35 References: 

[2043] [1] Medline 95284032 Solution structure of a cellulose-binding domain from Cellulomonas firni by nuclear 
magnetic resonance spectroscopy Xu GY Ong E : Giikes NR. Kilburn DG. Muhandiram DR ; H3rns-Brandts M. Carver 
JP, Kay LE. Harvey TS; Biochemistry 1995;34:6993-7009. 
40 [2044] 828 P 

A unique feature of the eukaryotic subtilisin-iike proprotein convertases is the presence of an additional highly con- 
served sequence of approximately 150 residues i'P domain) located immediately downstream of the catalytic domain 
Number of members: 81 

45 References: 

[2045] 

[1] Medline. 94252314 A C-terminal domain conserved in precursor processing proteases is required for intramo- 
50 lecular N-terminal maturation of pro-Ke:<2 protease Gluschankof P, Fuller RS: EMBO J 1994:13 2280-2288 

[2] Medline: 98225190. Regulatory roles of the P domain of the subtilisin-like prohormone convertases Zhou A, 
Martin S. Lipkind G, LaMendola J, Steiner DF; J Biol Chem 1998;273:11107-11114. 

[2046] 829 Uncharacterized protein family UPF0020 signature 
ss pROSITE cross-reference{s): PS01261; UPP0020 

The following uncharactertzed proteins have been shown [1] to share regions of similarities: 

Escherichia coli hypothetical protein ycfaY and HI0116/15, the corresponding Haemophilus influenzae protein. 
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Bacillus subtifis hypothetical protein ypsC. 

- Synechocystis strain PCC 6803 hypothetical protein slt0064. 
Methanococcus jannaschii hypothetical proteins MJ043B and MJ0710. 

s [2047j These- are hydrophilic proteins of from 40 Kd to about 80 Kd. They can be picked up in the database by the 
following pattern. 

[2048] Consensus pattfcrnD-P-[L!VMF3-C-G-[ST]-G-y{3HLI]-E 
References: 

10 

[20493 1 1] Baifooh A Unpublished observations (1997). 
[20503 830 Unchataeterised protein family UPF0031 signatures 

PROSITF. cross-referenceis): PS01049, UPR)031_1, PS01050. UPF0031_2 The following un characterized proteins 
have been shown [1] to share regions of similarities; 

?5 

Yeast chromosome XI hypothetical protein YKL151C. 
Caenorhahditis elegans hypothetical protein R107.2. 
Escherichia cols hypothetical protein yjeF, 
Bacillus subtilis hypothetical protein vxkO, 
so - Helicobacter pylori hypothetical protein HP1363. 

Mycobacterium tuberculosis hypothetical protein MtCY77.05c. 
Mycobacterium leprae hypothetical protein B229_C2_2Q1. 

- Synechocystis strain PCC 6803 hypothetical protein sf!1433. 
Methanococcus jannaschii hypothetical protein MJ1S88. 

[20513 These are proteins of about 30 to 40 Kd whose central region is well conserved They can be picked up in 
the database by the following patterns 

[2052] Consensus pattfcrn[SAVH!VW]-ELVA]-[LIV]-G-[PNS]-G-L-[GP3-x-[DENQT3 
Consensus pattern[GA]-G-x-G-0-[TVj-[LT]-[STA]-G-x-[LIVM] 
30 [20533 831.(ACOX'i 
Acyl-CoA oxidase 

[20543 This is a family of Acyf-CoA oxidases EC 1 3 3 6 Acyl-coA oxidase converts acyl-Co A into trans-2-enoyl-CoA 



35 Number of members: 39 

[20553 [!] H=i>=ishi H Dt> Belli 1 . L t'amaguohi K kato A H=i)=ishiM Nrshrmura M Medline 981926^4 tvk lecular 
rnaractenzatmn of a glvoxysorrMl Inng ch^in acyl-CoA coidasi* tnat is svnth<-sized ^ pie< uisnt of higher molecular 
mass in pumpkin J EiolChem W9b2T.r ^301-830" 
40 [20563 832. (AlCARFTJMPCHas) 
AiCARFT/lMPCHase bienzvme 

[2057] Tnis is a family ot Afunctional enzvmes cutjlv'sing th^ lust steps in de novo punne biosynthesis The fcifunc- 
ttonal enzyme is found in both prokaivotes and eukaivotes The second last step is catalysed bv ^-aminotrntdaroie- 
4-i.aiboamioe ribonucleotide fomi\ Itr^nster.jse FC .? 1 .? 3 (&IC&Pf : n this enzyme catalyses the toimvlation of AK ■ 
4S ARvuih 10-fuimvl-t-Ht ihydrufolate to yiflo FAI<~AR jnd if trahydiofolatf [1] Thf last p is rat ilv^ed by IMP {inosint 
mon^pnosphateicyclohydroiaseEC 3 5 4 10ilMPCHasei cyclcmg F^IC^R (C-form)laiTiinoimida2ole-4-c3rbo\ainide 
ribonucleotide) to IMP [i J. 

Number of members; 22 

[20583 

[1] Akira T, Komatsu M, Nango R, Tomooka A, Konaka K. Yamauchi IV!, Kitamura Y. Nomura S. Tsukamoto I; 
Medline: 97473523 Molecular cloning and expression of a rat cDNA encoding 5-amino!m!dasole-4-carboxamide 
ss ribonucleotide formyltfansferase/IMP cyclohydroiase" [published erratum appears in Gene 1998 Feb 27:203(2): 

337j Gene 1997.197 289-293. 

[2] Ray! EA, Moroson BA. Beardsley GP. Medline: 96147205 The human purH gene product 5-ammoimidazoie- 
4-carbo>: amide ribonucleotide formyitransferase/IMP cyclohydroiase. Cloning, sequencing, expression, punfica- 
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tion, kinetic analysis, and domain mapping " J Bio! Chem 1996.271 2225-2233 

[2059] 833. (AOX) 
Alternative oxidase 

s [2060j Tiie alternative oxidase is used as a second terminal oxidase in the mitochondria, electrons are transferee! 
directly from reduced ubiquinol to oxygen forming water [2], This. is. not coupled to ATP synthesis and is not inhibited 
by cyanide, this pathway is a single step process [1] in rice the transcript levels of the alternative oxidase are increased 
by low temperature [1], 

to Number of members: 27 

[2061] 

[1] ito Y. Satsho D. Nakazono M. Tsutsumi N. Hirai A. Medline: 98086211 Transcript levels of tandem-arranged 
»5 alternative oxidase genes in rice are increased by low temperature." Gene 1997;203:121-129. 

[2] Li Q, Riteel RG, McLean l.i. Mcintosh L. KoT. Berfrand H. Nargang FE; Medline: 983664 13 Cloning and analysis 
of the alternative oxidase gene of Neurospora crassa." Genetics 1996:142.129-140 

20 [2062] 834 (APH) 

Protein kinases signatures and profile 

[2063] Cross-reference{s}- PS00107; PROTEIN_KINAS£_ATR PS00108; 
P R OT E I N_K ! N AS E_ST, PS00109; P R OT E I N_K ! M AS E_T YR , PS50011: 
PROTElN_K!NASE_DOM 

25 (20643 Eukaryotic protein kinases (1 to 5] are enzymes that belong to a very extensive family of proteins which share 
a conserved catalytic core common to both serine/threonine and tyrosine protein kinases. There are a number of 
conserved regions, in the catalytic domain of prolein kinases Twoofthese regions have been selected to build signature 
patterns The first region, which is located m the N-termina! extremity of the catalytic domain, is a glyetne-rich stretch 
of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP binding. The second region, 

30 which is located in the central part of the catalytic domain, contains a conserved aspartic acid residue which is important 
for the catalytic activity of the enzyme [6], two signature patterns were derived for that region, one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was developed which is based on the alignment in [1] 
and covers the entire catalytic domain. 

[2065] Consensus pattern: (l..iV}-G-{P}-G--{P;KPYWMGSTNH}-(SGA;h{P l AO-(l. IV'CAT]-{Pi:)}-x- (GST ACL IV'MFY}-:< 
35 (5, 1 8)-[LiVMFYWCSTARHAiVPHL!VMFAGCKR]-K [K binds ATP] 

[2066] Sequences known to belong to this class detected by the pattern the majority of known protein kinases but tt 
fails to find a number of them, especially viral kinases which are quite divergent in this region and are completely 
missed by this pattern. 

[2067] Consensus pattern' [LIVMFYCj-x-EHYj-x-O-JLiVMFYl-K-xi^j-N-fLiVMFYCTKSt [D is an active site residue] 
40 [2068] Sequences known to belong to this class detected by the pattern Most serine/ threonine specific protein 
kinases with 10 exceptions (half of them viral kinases) and also Epstein-Barr virus BGLF4 and Drosophila nmaC which 
have respectively Ser and Arg instead of the conserved Lys and which are therefore detected by the tyrosine kinase 
specific pattern described below. 

[2069] Consensus pattern. lLI\flWF-YC}-x--jHy r }-x~i>4L.IVMr : Y)-|RSTAC)-)i(2)--N"|L.IVMf : YCK3) |D is an active site res- 
45 jdue] tyrosine specific protein kinases with the exception of human ERBB.3 and mouse bik This pattern will also detect 
most bacterial aminoglycoside phosphotransferases [8.9] and herpesviruses ganciclovir kinases [10]; which are pro- 
teins structurally and evolutionary related to protein kinases. Sequences known to belong to this class detected by the 
profile ALL, except for three viral kinases. This profile also detects receptor guanylate cyclases (see <PDOC00430>j 
and 2-5A-dependent nbonucleases. Sequence similarities between these two families and the eukaryotic protein kinase 
so family have been noticed before It also detects Arabidopsis thaliana kinase- like protein TMKL1 which seems to have 
lost its catalytic activity 

[2070] Note it a protein analyzed includes the two protein kinase signatures, the probability of it being a protein kinase 
is ciose to 100%. Note eukaryotic-type protein kinases have aiso been found in prokaryotes such as Myxococcus 
xanthus j 11] and Yersinia pseudotuberculosis. Note the patterns shown above has been updated since their publication 
55 in [7]. Note this documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so. 
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[2072] 835 (Asp_Glu_race't 
Aspartate and ojutamate racemases signatures 
20 [2073] Cioss-r^nwice. s} PS00023 ASP_GLU_RAOEMASE_ 1 PS009z4 
ASP_GLU_RACEMASE_2 

[2074] Aspartate racemase (EC 5 1 1 IclandgiutamatetacemasetECS i l o) are two evolutional y telattd bacterial 
enzuneb that do tut s^einto tequiie i v.ofactorfoi their activity [)} Glutainatf i3v.einase which intei convert*. L-^luta- 
mat- 1 into D-glut imate it r> j qiiiEt;d fui th> j biosynthesis of ptpttdogiy^n and iiitns peptide-based antibiotic «,uih J 1 , 

- 5 gramicidin 3 In addition to characterized aspartate and giutamate racemases this family aiso includes a hypotneticai 
protein from Erwinia oarotovora and one from Escherichia coil (ygeA) Two conserved cysteines are present in the 
sequence of these en,rymeh Thevaieenet led to play a toltr im-atalylic cKti*^ by a<. ting as bases m prckn ab^trat lie n 
from the substrate Signature patterns wt*re developed for both cysteines 
[2075] ConsenbUb pattern [!VA3-[UVM]-v-C-m0 1)~N~[STHWSA]-[STHHL1VFYcTANK] 

30 Consensus pattern: [LiVM3(2J-x-[AG3-C-T-[DEH}-[ , LtVMPYHPNGRS]-x-[LiVM3 
[2076] [ 1] Gallo h A . Knowies J R . Biochemistry 32 3931 -3990(1 993 1 
[20773 836. (ATP-sulfurylase) 
ATP-sulfurylase 

[2078] This family consists of ATP-sulfurylase of sulfate adenvlyltransfeiase IXC. 2 7 7 4 some of which are part of a 
35 Afunctional polypeptide chain associated with adenosyl phosphosulphaie lAPSt kinase APSJunase Both enzymes 
are required for PAPS (phosphoadenosine-phosphosuifate) synthesis from inorganic sulphate \2] ATP sulfurylase 
catalyses ihe synthesis of adenosme-phosphosulfate APS from ATP and inorganic sulphate [1] 

Number of members: 37 

[2079] 

[13 Kurima K, Warman ML Krishnan S, Domowicz M, Krueger RC Jr, Deyrup A, Schwartz NB; Medline; 98337975 
A member of a family of sulfate-aot.ivating enzymes causes murine brachymorphism" [published erratum appears 
45 in Proc Natl Acad Sci U S A 1998 Sep 29,95i20) 12071 ] Proc Nati Acad Set USA 1998;95:8681-8685. 

[2j Rosenthal E, Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein, PAPS synthetase, has 
both AT P sulfurylase and APS kinase activities " Gene 1995,165 ■'243-248. 

£20803 837- {ATP-synt_F) 

so ATP synthase (F/14-kDa) subunit 

[2081] This family includes 14-kDa subunit from vATPases[1], which is in the peripheral catalytic part of the complex 
[2] The family aiso includes archaebactenal ATP synthase subunit F [3] 

Number of members; 23 

ss 

[2082] 

[1 3 Guo Y, Kaiser K, VVieczorek H, Dow JA; Medline: 96269411 The Drosophila meianogaster gene vha!4 encoding 
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a 144Da F-suountt of the vacuolar ATPase " Gene 1t96 17': 2cr9-243 

[^j Pt-ng bB Cndfi BP T*>ai SJ M<= \b StonfOh Mtedlin<= 9621G41G Identification ota 14-kDa subunit ^ssouat^d 
with the catalytic sector cf elathfin-toaitd vesicle H+-ATPase " J Biol Chem 199o z71 3324-3.<27 
[33 Wilms R Frnbeig C Wegeile E Meier I Mavei F Mulier V Medline 96524968 Subunit structure and organ- 
s rzation of the genes of the- A1A0 ATPase from the Archaeon Mcthanosarcina mazci Col ' J Biol Chem 19C-6 271 

18843-18852, 

[2083] 838. (CBDJI) 
Starch bmdtng domain 

10 

Number of members: 48 
[2084] 839 {CbiX) 

[2085] Tne function of Coi> is uncertain now ever tt is found in cobalamin biosynthesis opeions ano so may nave a 
»5 related function. Some CbiX proteins contain a sinking histidine-nch region at their C -terminus, which suggests that it 
might be involved in metal chelation 

Number of members: 6 

20 [2086] [1] Raux E L=mois A Wanen MJ Rambach A Thermes Medline 9841612o Cobalamin (vilamm B)z) 
biosynthesis identification and chaiacterizatfjn of ,j Bacillus megatei mm cobl operon 'BioihwnJ 109^335 159-106 

840. t'Complex1_51K'! 

2s [2087] Respiratory-chain NA0H dehydrogenase 51 Kd subunit signatures Cross-reference) s) PSQCK>44 
COMPLE-Al.. , j1K. 1 PS0nt-4D COMPLK V 1..MK .<> 

[2088] R^piraforvKham NADH dehjcftogena^e !E<. 1 6 5 3} [1 2] !als-o kno*n as ronif.lo ! or NADH-ubiquinone 
oyidortjouctai^t is, ;<n oligonwif enzymatic comply located in the inn> j r mitochondrial twmbrant* v\him iho w^im 
to e*ist in the chioroplast and in cyanobactena tas a NADH-plastoquinone oMdoreouctase) Among the ZG to 3u 
30 polypeptide subunits of this bioenergetie. enzyme complex there is one with a molecular weight of a1 Kd (in mamma is V 
which is the second largest subunit of complex ! and is a component of the iron-sulfur (IP) fragment of the enzyme. It 
seems to bind to NAD, PMN. and a 2Fe-2S cluster. 
[2089] The 51 Kd subunit i*, highly simile k [3 4] 

35 - Subunit alpha of -ilcMtgent^ <■ uttophus U AD-redncmg hvdrogen;<st; tgene hrvF ) v\ him ilso bind 1 , to NAD FMN 
and a 2Fe-2S cluster. 

^ubunit N'jOl of Paiacoceut. d^nitnticans NADFI-ubiquincne oidottiducta^ 
Subunit F of Es< hwtchia coli N ADH-nbiqmnone oytdotedudase (g<-ne nuof) 

40 [2090] The 51 Kd subunit =ind the bactenal hydrogt-nase alpha subunit contains thief i^yion*. of sequent similar- 
ities fhe first one most probably ton expends to the NAD-bindmg site ttn» second k tht FMN-tinding site : md the 
third one whim contains tho-e cysteines t? tne uon-^ulfur biming region Signature patterns ha^e been de^bped 
for the FMN-bindmg and for the 2Fe-2S binding regions 

[2091] C onsensus pattern G [AMf-G-fAK|-V(I.K M] C G-|DF:](2! (Sr&](2i (L JM](-T> jEN] & 
45 Consensus pattern E-b-C-G-y-C-y-P-f-R-x-G [Thu three <~'s an* putative 2Fe-2S iig mdsj 

[IjRaganCl Gun lop £3ioenerft 16 1 36(1&3V) 

[23Weis^H FnednchT HcfhausG Pre is D Em J Biochein 197 58o-576. 1991 > 
[ 2} Fearnley I U Walker .1 £ Biochim Biophys Acta 1140 105-134(1592} 
so [ 4] Weidner U.. Geier S.. Ptock A.. Friedrich T.. Leif H.. Weiss H.. J. Mol. Biol. 233:109-122(1993). 

[2092] 84 1 . { DAP_epimera se ) 
Diammopimelate epimerase signature 
[2093] Oioss-refervnrefs) PS0 1 3,'6 DAP__ t PIMC- NA8E- 
ss Diammopimelate s;pim> j fase (EC 5 1 1 ~t catalysts thf isontu natation of LL- to D L-meso-di miinopmi^lats; in thu 
bios\ nthetic pathway leading fiom aspartate to lysine This enzyme is a crotein of about 30 Kd T^o conserved cysteines 
seem [1] to function as the acid and base in the catalytic mechanism. As a signature pattern, the region surrounding 
the first of these two active site cysteines were selected. 
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[2094] Consensus pattern' N~>;-D-G~S->;(4)-C-G~N~[GA]~x~R [C is an active site residue] Sequences known to belong 
to this class detected by the pattern ALL. except for an Anabaena dapF which has 3 Ser instead of the active site Cys 
[2095] [ 1] Cinlli M . Zheng R . Scapm G., Blanehard J.S . Biochemistry 37 16452-16458(1998) 
[2096] 842. <DNA_gyraseB_C) 
DNA topoisomerase II signature 

[2097] Cross.- reference/ s} PS001 77: TOPOISOMERASE j I 

DNA topoisomerase I {EC 5.99.1.2) [1,2,3AE1] is one of the two types of enzyme that catalyse the interconversion 
of topological DNA isomers Type II topotsomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria, prokaryotes. eukaryotes. and 
in African Swine Fever virus (ASF; If! bacteriophage T4 topoisomerase II consists of three subunits (the product of 
genes 39. 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyr.A and gyrB [E2j) In some bacteria, a second type II topoisomerase has been identified; it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes par€ and parEv ) 
In eukaryotes. type II topoisomerase is a homodimer. 

[2093] There are many regions of sequence homology between the different subtypes of topoisomerase II. The 
relation between the different subunrts is shown in the following representation: 



< About- 1 400- residues > 

[ Protein 39-* ][~"Protein 52 — ] Phage T4 

| gyrB * ][ gyrA- Prokaryote II 

Archaebacteria 

[ parE- *—.-][ parD--- — ] Prokaryote IV 

[ ~~ * - --] Eufcaryote and 

ASF 

'*': Position of the pattern. 

[2099] As a signatuie pattern for this family ot piotems a legion that contains a lnghK conserved pentapeptide was 
stiletto Tho pattern is legatee" in c.yrB m p?'iE and in protoin 3*i cf phace T4 topoisoirK'if'st. 
[2100] Consensus pattern [Ll\ M-i]-A-E-G-[DN]-^-A-x-[?TAG] 

[ 1 j Sternglanz R.. Curr. Opin. Cell Biol. 1:533-535(1990). 

[2] Bjornsti M.-A.. Curr. Opin. Struct. Biol. 1 99-103(1991 V 

{ Sharma A.. Mondragon A., Curt Opin. Struct. Biol. 5 39-47(1995). 

[4jf<o^ d j iiends Btochem oci .»0 t^e UrOt Wf)t>i 

[2101] 843. (DUF16) 
Protein of unknown function 

[2102] The function of this protein is unknown. It appears to only occur in Mycoplasma pneumoniae. 
Number of members; 26 

[2103] [1] Himmelreich R, Hilbert H r Plagens H, Pirkf E r Li BC, Herrmann R; Medline: 97105885 Complete sequence 
analysis of the genome of the bacterium Mycoplasma pneumoniae," Nucleic Acids Res 1996;24:4420-4449. 
[21043 844. (DUF21) 
[2105] Domain of unknown function 

[2106] This transmembrane region has no known function. Many of the sequences in this family are annotated as 
hemolysins, however this is due to a similarity to Swiss:Q54318 that does not contain this domain. This domain is 
found in the N-terminus of the proteins adjacent to two intracellular CBS domains CBS. 
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Number of members: 42 

[2107] 845. (DUF58) 
[2108] Integral membrane protein 
s [2109] The- members of this family are putative integral membrane proteins The function of the family is unknown, 
however the family includes Sec59 from yeast. Sec59 ts a dolichol kinase EC :2. 7.1 .1 08. but it is not clear if the enzymatic 
activity resides in this region or its N terminal region. 

Number of members: 13 

10 

[2110] 846. (DUF94) 

[2111] Doma in of u nknown function 

[2112] The function of this domain is unknown It is found in both eukaryotes and archaebactena. The alignment 
contains a completely conserved aspartate residue that may be functionally important The euharyotic domains contains 
*s three conserved cysteines and a histidinethat might be metal binding, however these are absent in the archaebacterial 
proteins. 

Number of members: 9 

20 [2113] 847. (FF) 
[2114] FF domain 

[2115] This domain may be involved in protein-protein interaction [1] 
Number of members: 42 

[2116] [1] Bedford MT, l.eder P; Medline S9322199 The FF domain' a novel motif that often accompanies- WW 
domains " Trends Biochern Sci 1999:24 264-265. 
[2117] 848. (FLO_LFY) 
Floricaula / Leafy protein 

30 [211S] T his. family consists of various plant development proteins which afe hornolociues of floricaula (FLO) and Leafy 
(LFY) proteins which are floral menstem identity proteins Mutations in the sequences of these proteins affect flower 
and leaf development. 

Number of members: 16 

35 

[2119] 

[1] Hofer J. Turner 1. Hellens R. Ambrose M. Matthews P. Michael A. Ellis N; Medline: 97411151 UN IF Oil ATA 
regulates leaf and flower moiphogenesis in pea Cnrr Biol 1Ct 7 7 58 1-587 
40 [2] Weige! D. Alvarez J. Smyth DR. Yanofsky MF, Meyerowite EM: Medline: 92274452 LEAFY controls floral mer- 

ist^m identity m Arabidopsis " Cell 1992 69 843-8J}9 

[2120] 849. (G-patch) 
G-patch domain 

■*5 [2121] This domain th found in a nnmbft ol RNA binding proems ana is also found in pruttin^ that contain PN-* 
binding domains. This suggests that this domain may have an RNA binding function. This domain has seven highly 
conserved glycines. 

Number of members; 47 

[2122] j 1 ) Aravind I... Koonin £■: V; Medline 1 0470032 G-patch: a ne w conserved domain in eukaryotic RNA-processing 
proteins and type D retroviral polyproteins" Trends Biochern Sci 1999.24.342-344 
[2123] 850. {Gram-ve„porinsi 
General diffusion Gram-negative porins signature 
ss [2124] Cross-reference(s) PS00576; 6R A M_ N E G_P O R I N 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds. Proteins, known 
as porins [1], are responsible for the 'molecular sieve' properties of the outer membrane. Porins form large water- filled 
channels which allows the diffusion of hydrophilic molecules into the penpiasmic space. Some porins form general 
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diffusion channels that aiiows any solutes up to a certain size itnat see is known as the exclusion limit) to cross the 
membrane, while other ponns are specific for a solute and contain a binding site for that solute inside the pores (these 
aw-< ktioAfi as st;ii : 'Cl^-i \ orins) As p^tins at* the major ouitii in^mbrdFU 1 ' (.totems trn»v -ti*>o siwve as r^e^pfor sit;* 
fft tne fcindino of pnag^s and ooctenouns Geneial diffusion ponns geneially assemble as tnmer in tne mwnbron^ 
s an<1 tnc transmembrane- core of these proteins is composed exclusively of beta stranos [2] it has be-on shov\n [2] tnat 
a numhet of <3ener.1l forms ate evolutionary related, these ponns. ^re 

Enterobactena phoE. 
tnterobactena ompC. 
to - Enterobactena ompF. 

Enterobactena nmpC. 
- Bacteriophage PA-2 LC. 
Neisseria Pi. A 
Neisseria Pi. B. 

?5 

[2125] As -i sigmtute pattern a conserved region v*as selected located in th* 1 C-teinnnai p-ttt of tht--st-- piot^ins 
Ahich spans two putative tr^nsniembianp het.j sttands 

[2126] Consensus pattern [LIVMF-t ]-mI1-G-m2V-V- a -F-v-[- - A( r)-[SN3-[STAV]-[L!VMF'i vx-j-V 

20 [IjBiWizR Bau-ir K Eur 1 Biochem 178 1-19(1988) 

[2]JapBK WattanPJ Q Res' Biophys 25 56"M0"i{ 1 990 1 

[3] Jeanteur D Lakey J H Partus F Mol Microbiol 5 ^io3--104i iOUi i 

[2127] 851.{HlyD) 
25 HlyD family secretion proteins signature 

[2128] Otoss-referenrefs) PSOO^S HL> D. .FAMILY 

utam-necjattve baotsr-ua produce- a numb*-) ef proteins which are seaeled into the giowth medium t.y a mechanism 
that dut*s not r^qiiiiu a cieawd N-termm i! stonai s^qimnct; The-se- ptotems whiit* havino. different funi. lions require 
the help of two or mote ptoteins for their sectetion aciossthe ceii envelope Amonyst v.htch a protein belonytng to the 
30 ABC transputers family (s^e the relevant entiy ^PDOCOOI 8b"- } and a protein b^lonjing te a family which i 1 . outr^ntly 
composed [1 to 5] of the followinq members: 



Gene 


I Species 


Protein which is spoiled 


hlyD 


Escherichia coif 


Hemolysin 


appD 


A.pleuropneumontae 


Hemolysin 


IcnD 


l.auororous lattis 


Lactococcm A 


iktQ 


A actinom>cetemcomitans 


Pasteurella naemolytica Leui-GtG/,in 


rt<D 


A pleutopne' in loniae 


Toxin-ill 


cjaD 


Bordetella pertussis 


Calmodulin-sensitive aden> late cyciase-hemoK sin(C) cioi\ sin 


c\3,A 


Escherichia coh 


ColicinV 


prtE 


Erwinia chrysanthemi 


Extracellular proteases B and C 


aprE 


Ps-Midomonas aeruginosa 


i Alkaline; protease 


emiA 


Escherichia coif 


Drugs and toxins 




Escherichia ooii 


Unknown 



These proteins or^ e^ olutionary related and consist of fiom ?90 to 480 amin^ acio residues They seem to be unchoied 
in the inner memorane by a N-termmal transmembrane region Their eva^t role in the secretion process is nit yet 
so known. The C-termmal section of these proteins is the best conserved region: a signature pattern from that region was 
derived. 

[2129] Consensus pattern: (UVM]~x(2VG~[lM}-x{3HSTGA 

[LIVMFYVVP) 

b^qu^nc e<. Known to t-ilonj k ttii 1 . class detected bv th-i pattern ALL i^-t-pt for emiA and yicR 

ss 
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References: 
[2130] 

s {1} Gilson L. Mahanty H.K., Kolter R., EMBO J, 9:3875-3884(1990). 

[2] Letoffe 8.. Delepelsire P., Wsndersman C . CM BO J. 9 13" ? 5--1 382(1 990}. 

[3] Stoddard G W.. Petee! J.P . van Belkum M J.. KokJ.. McKay LL, Appl, Environ. Microbiol. 58,1952-1961(1992). 
[4] Duong R, Lazdunski A., Cami B.. Murgier M„ Gene 121:47-54(1992). 
[5j Lewis K... Trends Biochem Set. 19' 119-123; 1994). 

10 

[2131] 852. (IBR) 
In Between Ring fingers 

[2132] The IBR (in Between Ring fingers.) domain is found to occur between pairs of ring fingers (zf-C3HC4) The 
function of this domain is unknown This domain has also been called the C6HC domain and DRIL (for double RING 
*s finger linked) domain [2\ 

Number of members: 25 

[2133] 

[1] Morett E. Bork P: Medfine: 10386851 A novel tra reactivation domain in parkin/Trends Biochem Sci 1999;24: 
229-231. 

[2] van der Reijden 8A Erpelinck-Verschueren CA, Lowenberg B, Jansen JH; Medline: 99349709 TRIADS: a new 
class of proteins with a novel cysteine-rich signature." Protein Sci 1999;8:1557-1561. 

[2134] 853 (iPPT) 
IPP transferase 

[1] Durand JM, Bjork GR, Kuwae A. Yoshikawa M, Sasakawa C; Medline: 97440128 The modified nucleoside 
30 2-nK!thylthio-N6-isopenienyladenosine in tRMA of Shigella fiexneri is. required ioi expression of virulence genes. 

" J Bacterid 1997;179:5777-5782. 

[2] Boguta M, Hunter LA, Shen WC. Giilman EC, Martin NC. Hopper AK; Medline; 94187700 Subcellular locations 
of MODS proteins.' mapping of sequences sufficient for targeting to mitochondria and demonstration that mito- 
chondrial and nuclear isoforms commingle in the cytosol." Mol Cell Biol 1994:14 2298-2308 
35 [3] Giilman EC. Slusher LB, Martin NC, Hopper AK: Medline: 91203858 MODS translation initiation sites determine 

Nc-isopentenyladenosine modification of mitochondrial and cytoplasmic tRNA.'' Mol Cell Biol 199l,l1'2382-2390 

[2135] 854. (KE2) 
K£2 family protein 

40 [2136] The function of members of this family is unknown, although they have been suggested to contain a DMA 
binding leucine zipper motif [2]. 

Number of members: 9 

4S [2137] 

[1] Ha H, Abe K, Artzt K: Medline: 92084131 Primary structure of the embryo-expressed gene KE2 from the mouse 
H-2K region." Gene 1991107:345-348. 

[2] Snang HS Wong SM Tan MM \Vu M Meohne 95129859 VKE2 a \east nuclear gene encoding a orotem 
■io showing homology to mous<= KE2 and containing a put=rti\e leucine-:: tp|.er motif" G^ne 19^-1 15 1 iCT-201 

[2138] 855. (Lipoprotein^) 
Prokaryotic membrane lipoprotein lipid attachment site 
[2139] Oioss-refer^ncefs) PS00013 PROKARJ.IPOPROITXIN 
55 in prokjryuttiS m> j mbran> j lipoproteins ate ^ynthe^ed with a pterin s>of signal peptide vi hid! is cleaved bv a speuftc 
lipoprotein signal peptidase (signal peptidase II) The peptidase lecognces a conserved sequence and cuts upstream 
of a cysteine residue to which a glycende-fattv acid lipid is attached [1 ]. Some ot the proteins known to undergo such 
processing currently include (for recent listings see [1.Z.3]): 
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Major outer membrane lipoprotein (murein-lipoproteins) (gerte ipp), 

Escherichia coli lipoprotein-28 (gene nlpAi. 

Escherichia coli lipoprotein-34 (gene nlpB). 

Escherichia coli lipoprotein nipC. 
s - Escherichia coli lipoprotein nlpD. 

Escherichia colt osmotically inducible lipoprotein B (gene osmB) 

Escherichia coli osmotically inducible lipoprotein E (gene osmE) 

Escherichia coii peptidogiycan-assoaated lipoprotein (gene pal). 

Escherichia coli rare lipoproteins A and 8 (genes rpiA and rplB). 
to - Escherichia coli copper homeostasis protein eutF (or nlpE) 

Escherichia coli piasmids iraT proteins. 

Escherichia coli Col piasmids lysis proteins 

A number of Baciiius beta-lactamases. 

Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 
*s - Borrelia burgdorferi outer surface proteins A and B (genes ospA and osp8). 

Borrelta hermsii variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 

Chlamydia trachomatis outer membrane protein 3 (gene omp3), 

Fibrobacter succinogenes endoglucanase cei-3. 

Haemophilus influenzae proteins Pal and Pop. 
so - Klebsiella pullu kinase (gene pulAi. 

Klebsiella pullulunase secretion protein pulS. 

Mycoplasma hyorhinis protein p37. 

Mycoplasma hyorhinis variant sutface antigens A, B. and G (genes vIpABC ) 
Neisseria outer membrane protein H 6 
25 - Pseudomonas aeruginosa iipopeptide (gene ippL). 
Pseudomonas soianacearum endoglucanase egi. 

Rhodopseucfomonas vindis reaction center cytochrome subunit (gene cytC). 
- Rickettsia 17 Kd antigen. 

Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 
30 - Streptococcus pneumoniae oligopeptide transport protein A (gene smiA). 

Treponema pallidium 34 Kd antigen, 

Treponema pallidium membrane protein A (gene tmpA). 

Vibrio harveyi chitobiase (gene chb). 

Yersinia virulence plasmid protein yscJ. 
35 - Haiocyanin from Natrobactenum pharaonis [4j, a membrane associated coppef-biriding protein. This is the first 

archaebactenal protein known to be modified in such a fashion). 

[2140] From the precursor sequences of all these proteins, a consensus pattern and 3 set of rules to identify this 
type of post-translational modification were derived 
40 [2141] Consensus pattern {DERK}(6)-[LlVWVVSTAG](2HUVMFYSTAGCGHAGS]-C [C is the lipid attachment 
site] Additional rules: 1 ) 

[2142] The cysteine must be between positions 15 and 35 of the sequence in consideration 2} There must be at 
least one Lys or one Arg in the first seven positions of the sequence Sequences known to belong to this class detected 
by the pattern ALL.. Other sequencers) detected in SWISS-PRO'f some 100 prokaryotic proteins. Some of them are 
45 not membrane lipoproteins, but at least half of them could be. 

References 

[2143] 

[1] Hayashi S., Wu H.C, J. Bioenerg. Biomembr. 22:451-471(1990). 
[2] Klein P.. Somorjai R L , Lau P.C.K , Protein Eng. 2 16-20(1988). 
[3] von Heijne G.. Protein Eng. 2:531-534(1989). 

[4] Mattar S., Scharf B.. Kent S.B K. Rodewald K . Oesterhelt D., Engelhard M. J. Biol Chem. 269:14939-14945 
ss (1994). 

[2144] 856. (Lipoprotein. 7} 
Adhesin lipoprotein 
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[2145] This family consists of the pSO and \ai table adherence-associated antigen (Vaa) adhesins fiom Mycoplasma 
hominis M hominis is a mycoplasma ^ssouated with human moyenttal diseases pneunxnia and septic arthntts [1] 
An adhesin is a cell ^.uff.Ke nxi^eule thai mediates, adrn^ion te othct e^iis oi k trn» suiroundmg surface or sutstpiU' 
The V<ja antigen is a 50-kDu suiface Itfopiotein that has four tandem repetitive DMA sequences encooing a periodic 
s peptide structure ana is highly immunogenic in the human host [1] pSO is also a 50-kOa lipoprotein having three 
repeats A B and C that m.jy be n tetiamet ot 191 kDa in its n.ttive environment \1 [ 

Number of members; 18 

10 [2146] 

[1] Zhang Q. Wise KS: Medline: 98294788 Molecular basis of size and antigenic variation of a Mycoplasma hominis 
adhesin encoded by divergent vaa genes. Infect Immun 1996:64 2T3/-2/44 

[2] Hennch B, Kitzerow A. Feldmann RC, Schaal H. Haddtng U; Mediine: 97047675 Repetitive elements of the 
*s Mycoplasma hominis adhesin p50 can be differentiated by monoclonal antibodies." Infect Immun 1998:84: 

4027-4034. 

[2147] 857. (MaoCJike) 
MaoC like domain 

so [2148] The MaoC protein is found to share similarity with a wide variety of enzymes: estradiol 17 beta-dehydrogenase 
4, peroxisomal hydratase-dehydrogenase-eptmerase, fatty acid synthase beta subunit. All these enzymes contain other 
domains. This domain is also present tn the NodiM nodulatton protein N No specific function has been assigned to this 
region of any of these proteins. The maoC gene is part of a operon with maoA which is involved in the synthesis of 
monoamine oxidase [1]. 

Number of members: 46 

[2149] [1] Sugino H, Sasaki M, Azakami hi, Yarnashita M, Murooka Y Medline 962.35221 A rnonoamine-reguiated 
Klebsiella aerogenes operon containing the monoamine oxidase structural gene (maoA) and the maoC gene " J Bac- 
30 tenol 1 992; 1 74.2485-2492. 
[2150] 858. (MSP) 

Manganese-stabilizing protein / photosystem H polypeptide 

[2151] This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving complex (OECi of 
plants and cyanobactena. The protein is also known as the manganese-stabilizing protein as it is associated with the 
35 manganese complex of the OEC and may provide the ligands for the complex (1 j. 

Number of members: 17 

[2152] [1] Philbnck JB. Zilinskas BA. Medline 88334494 "Cloning, nucleotide sequence and mutational analysis of 
40 the gene encoding the Photosystem II manganese-stabilizing polypeptide of Synechocystis 6803." Mol Gen Genet 
1988;212:418-425. 
[2153] 859 f N AC ) 

[2154] jl] Makarova KS. Aravind L, Galperin MY, Grishin NV. Tatusov Rl... Wolf Yl, Koomn E;.V, Medline: 99342100 
Comparative genomics of the Archaea (tv.uryarchaeotai: evolution of conserved protein families, the stable core, and 
45 the variable shell " Genome Res 1 999;9608-628 

Number of members: 27 

£21553 860. (Nop) 

so Putative snoRNA binding domain 

[2156] This family consists of various Pre RNA processing ribonucleoproteins The function of the aligned region is 
unknown however it may be a common RNA or snoRNA or Noplp binding domain Nop5p (Nop58p) Swiss :Q 12499 
from yeast is the protein component of a ribonucleoprotein protein required for pre-1 8s rRNA processing and is sug- 
gested to function with Noplp in a snoRNA complex [1], Nop56p Swiss:000567 and NopSp interact with Noplp and 

55 are required for ribosome biogenesis [2]. PfpBlp Swiss p49704 is required for pre-rnRNA splicing m S cerevisiae [3]. 
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Number of members: 23 
[2157] 

s [1] VVu P. Brockenbrough JS, Metcalfe AC, Chen S, Arts JP; Medline: 9829-9165 Nop5p is a small nucleolar ribo- 

nucleoprot.ein component required for pre- 18 S rRNA processing in yeast." J Biol Chem 1968:2/3.16453-16463. 
[2] Gautier T, Serges T, Tollervey D. Hurt E. Medline 8036777 Nucleolar KKE/D repeal proteins Nop56p and Nop58p 
interact with Noplp and are required forribosome biogenesis" Mol Cell Bio! 1997:17.7088-7038. 
[3j Wetden hammer EM. Singh M. Ruiz-Noriega M. Woolford JL Jr. Medline: S61 84869 The PRP31 gene encodes 

10 a novel protein required for pre-mRNA splicing in Saccharomyces cerevisiae" Nucleic Acids Res 1996:24 

1164-1170. 

[2158] 861 {Nramp) 

Natural resistance-associated macrophage protein 

»5 The natural resistance-associated macrophage protein (NRAMP) family consists of Nrampl, Nramp2, and yeast pro- 
teins Smf1 and Smf2 The NRAMP family is a novel family of functional related proteins defined by a conserved hy- 
drophobic, core often transmembrane domains, \5) This, family of membrane proteins, are divalent cation transporters,. 
Nrampl is an integral membrane protein expressed exclusively in ceils of the immune system and is recruited to the 
membrane of a phagosome upon phagocytosis [1] By controlling divalent cation concentrations Nrampl may regulate 

so the interphagosomal re-plication of bacteria [1] Mutations m Nrampl may genetically predispose an individual to sus- 
ceptibility to diseases including leprosy and tuberculosis conversely this might however provide protection form rheu- 
matoid arthritis [1], Nramp2 is a multiple divalent cation transporter for Fe2+, Mn2+ and Zn2+ amongst others it is 
expressed at high levels in the intestine: and is major transferrin-independent iron uptake system in mammals [1] The 
yeast proteins Smf1 and Smf2 may also transport divalent cations [3] 

Number of members: 36 

[2159] 

30 [1] Govern G, Gros P. Medline 1 98383998 Macrophage- NRAMP1 and its role in resistance to microbial infections 

" Inflamm Res 1998:47:277-284. 

[2] Agranoff DD. Krishna S Medline: 98294035 Metal ion homeostasis and intracellular parasitism." Mot Microbiol 
1998:28:403-412. 

[3] Pinner E. Gruenheid S, Raymond M, Gros P; Medline: 98030569 Functional complementation of the yeast 
35 divalent cation transporter family SMF by NRAMP2. a member of the mammalian natural resistance- associated 

macrophage protein family" J OStol Chem 1997,272 28933-28938. 

[4] Celiier M, Belouchi A, Gros P: Medline: 96402487 Resistance to intracellular infections comparative genomic 
analysis of Nramp." Trends Genet 1996; 12 201-204. 

[5] Cellier M. Prjve G. Belouchi A, Kwan T, Rodrjgues V t Chta W. Gros P. Medline' 96036029 Nramp defines a 
40 family of membrane proteins." Proc Natl Acad Set U S A 1995:92:10089-10093. 

[2160] 862 <NTPJransf_2) 
Nucleotidyltransferase domain 

Members of this family belong to a large family of nucleotidyltransferases [1]. 

45 

Number of members: 83 

[2161] [1] Holm L. Sander C: Medline 1 96005605 DNA polymerase beta belongs to an ancient nucleotidyltransferase 
superfamiiy." Trends Brochem Sci 1995;20:345-347. 
so [2162] 863. (Paramyxo_P) 

Paramyxovirus P phosphoprotein 

[2163] This family consists of paramyxovirus P phosphoprotein from sendai virus and human and bovine parainflu- 
enza viruses. The P protein is an essential part of the viral RNA polymerase complex formed form the P and L proteins 
[1]. The exact role of the P protein in this complex in unknown but it is involved in multiple protein-protein interactions 
£5 and binding the polymerase complex to the nucleocapsid or ribonueleoprotein template It also appears to be im- 
portant for the proper folding of the L protein [1], The paramyxoviruses have a negative sense ssRNA genome [1] 
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Number of members: 15 
[2164] 

s [1] Bowman MC. Smaiiwoad S, Moyer SA; Medline: 99329169 Dissection of Individual Functions of the Sendai 

Virus Phosphoproiein in Transcription." J Vtrol 1999:73:6474-6483. 

[2] Matsuoka i' Curran 1 PeletT K>iaf-ofsk s Q Ray P Cumpans RW M^dlin^ 9123780* Thy P gene of human 
parainfluenza virus type 1 encodes P and C proteins out not a cysteme-rich v protein "J vVol 1'J01 05 3406-341Q 

10 [2165] 884. (Palatini 

[2166] This f imilv consists of various patatin olyoprut* ins. from plants The patatin protein ^ counts for up to 40 'r- 
ot the total soluble protein in potato tubeis [2] P^tetm is a stoiayc protein but it also has th^ enzymatic activity of lipid 
a>.>l hydtoiase catalysing the olea^ aye of fatty acidi from memhtane lipid* [2] 

'5 Number of members: 21 

[2187] 

[1] Banfaivt Z. Kostyal Z, Barta E: Medline' 95107249 Solatium brevtdens possesses a non--sucrose--inducible 
20 patatin gene." Mo! Gen Genet 1994:245:517-522. 

[2] Mignery GA. Pikaard CS. Park WD. Medline. 83226014 Molecular characterization of the patatin multigene 
family of potato " Gene 1988.6227-44. 

[2168] 865. {Pentapeptide_2) 
25 Pentapeptide repeats (8 copies) 

[2169] These repeats are found in many mycobacterial proteins These repeats are most common in the PRE family 
of proteins, where they are found m the MPTR subfamily of PPE proteins. The function of these repeats is unknown. 
The repeat can be approximately described as XIMXGX, where X can be any amino acid These repeats are similar to 
Pentapeptide [1]. however it is not clear if these two families are structurally related 

Number of members: 362 

[2170] 

35 [1] Bateman A Murzin A, Teichrnann SA; Medline 9831 8059 Structure and distribution of pentapeptide repeats 

in bacteria." Protein Sci 1998:7:1477-1480. 

[2] Cole ST. Biosch R, Parkhiii J. Gamier T. Churcher C. Hants D, Gordon SV, Eiglmeier K. Gas S. Barry CE 3rd. 
Tekaia F, Badcock K, Basham D, Brown D, Chilhngworth T, Connor R, Davies R, Devlin K, Feltwel! T. Gentles S, 
Hamiin N, Hoiroyd S, Hornsby T. Jagels K. Barreii BG; Medline: 98295987 Deciphering the biology of Mycobac- 
40 terium tuberculosis from the complete genome sequence." Nature 1998:393 537-5<14. 

[2171] 868. (Peptidase_C13) 

Peptidase C1 3 family 

This family of peptidases is known as the hemoojobinase family because it contains a globin degrading enzyme from 
■*s blood parasites Swiss' P42665 l-towever relatives are found in plants and other organisms thai have other functions 
Members of this family are asparaginy! peptidases [1], 

Number of members: 26 

so [2172] [1] Chen JM, Dando PM, Rawlings NO. Brown MA. Young NE. Stevens RA, Hewitt E, Watts C. Barrett AJ; 
Medime: 97218252 Cloning, isolation, and characterization of mammalian iegumain, an asparaginy! endopeptidase." 
J Biol Chern 1997;272:8090-8098, 
[2173] 867. <Pro„dh) 
Proline dehydrogenase 

ss 

Number of members: 25 

[2174] [1] Ling M, Ailen SW, Wood JM, Medline: 95055736 Sequence analysis identifies the proline dehydrogenase 
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and delta 1- pyrrohne-5~carboxylate dehydrogenase domains of the multifunctional Escherichia cot! PutA protein." J 
Mo! Bioi 1994:243:950-956. 
[2175] 868. (PsbP) 

[2176] This family consists of the 23 kDa subuntt of oxygen evolving system of photosystem 1 1 or PsbP from various 
s plants (where !t is encoded by the- nuclear genome) and Cyanobacteria The 23 hDa PsbP protein is required for PSII 
to be fully operational in \ivo. it increases the affinitv of the vvatei oxidation site for CI- and provides the conditions 
required for high affinity binding of Ca2+ [2] 

Number of members: 25 

10 

[2177] 

[1] Rova EM, Mo t-wen B, Frednksson PO, Styring S: Medline. 97067138 Photoactivation and photoinhibition are 
competing in a mutant of Chiamydomonas reinhardtii lacking the 23-kDa extrinsic subunit of photosystem II " J 
*5 Bio! Chem 1996:271:28918-28924. 

[2] Kochhar A. Khurana JP. Tyagi AK: Medline: 97191538 Nucleotide sequence of the psbP gene encoding pre- 
cursor of 23-kDa polypeptide of oxygen-evolving complex in Arabidopsis thaliana and tts expression in the wild- 
type and a constltutiveiy photomorphogenlc mutant" DNA Res 1996;3:277-285. 

20 [2178] 869 (PDA) 

[2179] The PUA domain named after PseudoUridine synthase and Archaeosine transglycosylase : was detected in 
archaea! and eukaryotic pseudoundlne synthases, archaeal archaeosine synthases, a family of predicted ATPases 
that may be involved in RNA modification, a family of predicted archaea! and bacteria! rRNA methylases. Additionally, 
the PUA domain was detected in a family of eukaryotic proteins that also contain a domain homologous to the translation 
25 initiation factor eiF1/SUI1 : these proteins may comprise a novel type of translation factors. Unexpectedly, the PUA 
domain was detected also in bacteria! and yeast giutamate kinases; this is compatible with the demonstrated role of 
these enzymes in the regulation of the expression of other genes [1]. It is predicted that the PUA domain is an RNA 
binding domain. 

30 Number of members: 48 

[21803 (1]Aravind I. Koonin £V, Medline. 99193178 Novel predicted RNA-binding domains associated with the trans- 
lation machinery " J Mo! Evol 1999:48 291-302. 
[2181] 370 (RF1) 
35 eRF1 -like proteins 

[2182] Members of this family are peptide chain release factors The eukaryotic Release Factor 1 proteins «eRFls) 
are involved in termination of translation The eRF1 protein is functional for all stop codons and appears to abolish 
read-through of these codons. This family also includes other proteins for which the precise molecular function is 
unknown. Many of them are from Archaebacteria. These proteins may also be involved in translation termination but 
40 this awaits experimental verification. 

Number of members; 25 

[21833 

45 

[1] Frolova L. Le Goff X, Rasmussen HH, Cheperegin S, Drugeon G, Kress M, Arman I. Haenni AL, Celts JE, 
Philippe M. etal' Medline 85082951 A highly conserved eukaryotic protein family possessing properties of polypep- 
tide chain release factor" [see comments] Nature 1994:372 701 -703 

[2] Drugeon G, Jean-Jean O t Frolova L Le Goff X, Philippe M. Kisseiev L, Haenni AL. Medline 97315314 Eukary- 
50 otic release factor 1 feRFI } abolishes readthtough and competes with suppressor tRNAs at all three termination 

codons in messenger RNA." Nucleic Acids Res 1997,25:2254-2258. 

[2184] 871. {Ribosoma!„L14e) 
Ribosomai protein I..14 
55 [2185] This family includes the eukaryotic ribosomai protein L14. 
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Number of members: 15 

[2186] 872. {Rfbos5oma!_S27) 
Ribosomai protein S27a 

s [2187j This family of ribosomai proteins consists mainly of the 40S ribosomai protein S27a which is synthesized as 
a C--terminai extension of ubiquitin (CEP). The S27a domain compromises, the C -terminal half of the protein. The 
synthesis of ribosomai proteins as extensions of ubiquitin promotes their incorporation into nascent nbosomes by a 
transient metabolic stabilization and is required for efficient rtbosome biogenesis [3], The ribosomai extension protein 
S27a contains a basic region that is proposed to form a sine finger; its fusion gene is proposed as a mechanism to 

to maintain a fixed ratio between ubiquttin necessary for degrading proteins and ribosomes a source of proteins [2] 

Number of members: 36 

[2188] 873 {Sperrnine_synth} 
*s Spermine/spermidine synthase 

[2189] Spermine and spermidine are poiyamin&s. This family includes spermidine synthase that catalyses the fifth 
(last) step in the biosynthesis of spermidine from arginine, and spermine synthase. 

Number of members: 39 

[2190] 

[1] Mezquita J. Pau M. Mesquita C. Medline: 97449308 Characterization and expression of two chicken cONAs 
encoding ubiquttin fused to ribosomai proteins of 52 and SO amino acids." Gene 1997,195:313-319 
25 [2] Redman KL, Rechsteiner M; Medline: 89181932 identification of the long ubiquttin extension as ribosomai 

protei- S27a." Nature 1989;338:438-440. 

[3] Ftnley D, Bartel B, Varshavsky A: Medline: 89181925 The fatis of ubiquitin precursors are ribosomai proteins 
whose fusion to ubiquttin facilitates nbosome biogenesis " Nature 1989;338:394-401 

30 [2191] 874. (Surp)Surp module 

[1] Denhez F, Lafyatis R; Medline: 94286805 Conservation of regulated alternative splicing and identification of func- 
tional domains in vertebrate homoiogs to the Drosophila splicing regulator suppressor-of-white-apncot" J Biol Chem 
1994;269:16170-16179. 

[2192] This domain is also known as the SWAP domain SWAP stands for Suppressor-of-White-APnc.ot. It has been 
35 suggested that these domains may be RNA binding [1] 

Number of members: 32 

[2193] 875 {TFHEjTFllE alpha subuntt 
40 The genera! transcription factor TFIIE has an essential role in eukaryotic transcription initiation together with RNA 
polymerase II and other genera! factors Human TRIE consists of two subumts TFilE-alpha Swiss:P29083 and TFIIE- 
beta Swiss P29084 and joins the preinttiation complex after RNA polymerase II and TFHF [1], This family consists of 
the conserved amino terminal region of eukaryotic TFilE-alpha [2] and proteins from archaebacterta that are presumed 
to be TFIiE-alpha subunits also Swiss: 029501 [3], 

45 

Number of members: 12 
[2194] 

so [1] Ohkuma Y, Sumtmoto H, Hoffmann A, Shimasakt S. Honkoshi M. Roeder RG: Medline: 92065982 Structural 

motifs and potential sigma homologies in the large subunit of human general transcription factor Tf-'IE." Nature 
1991:354:398-401. 

[2] Ohkuma Y Hashimoto S, Roeder RG, Horikoshi M; Medline: 93087200 Identification of two large subdomains 
m Tf-T IE-alpha on the basis of homology between Xenopus and human sequences Nucleic Acids Res 1992 20: 
55 5838-5838. 

[3] Klenk HP. Clayton RA, Tomb JF. White O, Nelson KE, Ketchum K.A, Dodson RJ. Gwinn fvl Hickey EK. Peterson 
JD, Richardson DL. Ker lavage AR. Graham DE. Kyrpides NC. Fieischmann RD. Quackenbush d. Lee NH. Sutton 
GG, Gill S, Kirkness EF. Dougherty BA, MeKenney K. Adams MD. Loftus B. Venter JC, et al: Medline: 98049343 
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The complete genome sequence of the hyperthermophilic sulphate- 1 educing archaeon Archaeoglobus futgtcfus 
" Nature 1997;390:364-370. 

[2195] 876. (Transglut_core) 

[2196] Gross-referencefsj PS0GS47. TRANSGLUTAMINASES 

Transglutaminases (EC 2 3 2 1 3) iTGase) [ t 2} are calcium-dependent enzymes that catalvze the c eoss- lining of pro- 
teins by promoting the formation of tsopeptide bonds between the gamma-cac boxy! group of a glut a mine; in one polypep- 
tide chain and the epsilon-amino group of a lysine in a second polypeptide chain. TGases also catalyze the conjugation 
of polyaminesto proteins The best known transglutaminase is blood coagulation factor Xitt. a plasma tetramenc protein 
10 composed of two catalytic A subuntts and two non-catalytic B subuntts Factor Kill is responsible for cross-linking fibrin 
chains thus, stabilizing the fibrin clot Other forms of transglutaminases are widely distributed in \anous organs tissues 
and body fluids Sequence data is available for the following fotms. of TGase 

Transglutaminase K (Tgase KT a membrane-bound enzyme found in mammalian epidermis and important for the 
*s formation of the cormfted cell envelope (gene TGM1 ). 

Tissue transglutaminase (TGase C). a monoinenc ubiquitous enzyme located in the cytoplasm (gene TGM2). 
Transglutaminase 3, responsible tor the later stages of cell envelope formation in the epidermis and the ban follicle 
(gene TGM3). 

Transglutaminase 4 (gene TGM4), 

20 

[2197] A conserved cysteine is known to be involved in the catalytic mechanism of TGases The eiythtocyte mem- 
btane band 4 2 protein, which piobably plays an important role in tegulating the shape of erythrocytes and then me- 
chanical properties, is evolutionary related to TGases However the active site cysteine is substituted by an alanine 
and the 4.2 protein does not show TGase activity. 
25 [21983 Consensus pattern:[GT]-Q-[CA^W-V-x-ESA3-[GA]-[IVT3-x(2hT-x-[LMSC]-R-[CSA3-[LV]-G {The first C is the 
active site residue] Sequences known to belong to this class detected by the patternALL, Other sequence(s) detected 
in SWISS-PROTNONE. 

[2199] [ 1] ichinose A Bottenus R E , Qjvie E vV J Bio! Chem 265 1 3411-13414, 1990) [ 2] Gteenberg C S Birck- 
bichlerPJ, Rice R H FASEB J 5 3071-3077(1991) 
30 [2200] S3W iTiuB_N)TruB family pseudouridylate synlhase (N terminal domain) 

Members of this family are involved in modifying bases in RNA molecules. They carry out: the conversion of uracil 
bases to pseudoundme. This family includes TruB, a pseudoundylate synthase that specifically converts uracil 55 to 
pseudoundine in most tRNAs This family also includes Cbf5p that modifies tRNA [2] 

35 Number of members: 33 

[2201] 

PJNutsek Wrzesmskt J bal in A Lane EG Ofengand-t Medline C607t944 Puiitication cloning and properties 
40 of thetRNA psi 55 synthase from Escherichia coif." RNA 1995:1:102-112. 

[2jL=ikntatne ULJ Bousquet-Anktnelli C Hemy\ Cacitigues-^ifei M follerveyD Mtolint: 9fi130b21 Ttn=- bov 
H + AC-i snnRNAs cany Cbf5p the putjtive rPNA pseudoundme syntnose Genes De^' 1908 12 S27-537 

[2202] 878 itiDFGPiU'iP -giucoi e-1 -f hosphate uno\ lyfttansfeiase 

45 This family consists of UTP-gJiK -phosphate urio\iyUr*ns,ft;i is u s EC 2 7 7 9 Also known *s UDP-oincose py- 
rophcsphoiv'lase (UDPGP1 and Glucose- 1 -phosphate uridyl) Itiansferase UTP-y Incase- i-pnosphate uridylyltrans- 
ferase catalyses the interconversion of MgUTP + glueose-1 -phosphate and UDP-giucose + MgPPi [1], UDP -glucose 
is an important intermediate in mammalian carbohydrate interconversion involved in various metabolic roles depending 
on tissue type [1] In Dict\ostelium (slime molds mutants in this enzyme abort the development cycle [2] Also within 

so the family is UDP-N-acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hvpothetical proteins from Borrelta burg- 
dorferi the lyme oisease spiiochaete Swiss 01it8P3 and Swiss CjtCWO 

Number of members: 18 

ss £2203] 

[1] Duggieby RG. Chao YC, Huang JG, Peng HL, Chang HY; Medline: 96202932 Sequence differences between 
human muscle and liver cDNAs for UDPgiucose pyrophosphatase and kinetic properties of the recombinant 
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enzymes expressed in Escherichia coli" Eur J Biochem 1996;235' 173-1 79 

[2] Ragheb JA. Dottin RP: Medline: 87231075 Structure and sequence of a UDP glucose pyrophosphoryiase gene 
of Didyostelium diseoideum" Nucleic Acids Res 1987;153891-3906 

[3] Mio T, Yabe T. Ansawa M, Vamada-Okabe H, Medline. 9626910? The eukaryotic. UDP-N-acetylgfuco&amine 
s pyrophosphorylases. Gene cloning, protein expression, and catalytic mechanism J Biol Che-m 1998.273. 

14392-14397. 

[2204] 379. (UPF004) Uncharactereed protein family UPF0Q44 signatureCross-referenceis) PSQ1301, UPF0Q44 
The following uneharaetehzed proteins have been shown [l]to be highlysimiiar: - Bacillus subtilis hypothetical protein 
10 yqel. 

Escherichia coil hypothetical protein yhbY and Hit 333, the corresponding Haemophilus influenzae protein. 
Methanococcus jannaschii hypothetical protein MJ0652. 

»5 These are small proteins of 10 to 15 Kd. They can be picked up in the database by the following pattern. This pattern 
is located in the N-terminal part of these proteins. 

[2205] Consensus pattern. l..-lST{-x(3)-K-)i(3HKRHSGA}-X"jGA}-H-)i-l..-x-P-|L.IVj-x(2}- (LiV}-|GA]->{2)-G Sequenc- 
es known to belong to this class detected by the pattemALL. Other sequence(s) detected in SWISS-PROTNONE. 
[2206] 880 (Zf-A20)A20--likezincfingerA2Q- {an inhibitor of cell death Hike zinc fingers The zincfmger mediates self- 
so association in A20 These fingers alsomediate iL-1 -induced NF-kappa B activation 

Number of members: 22 

[2207] 

[1] Heyninck K. Beyaert R. Medline. 99126071 The cytokine-inducible zinc finger protein A20 inhibits I L--1 -induced 
NF- kappaB activation at the level of TRAF6. FEBS Lett 1999;442 147-150. 

[2] De VaickD, Heyninck K, Van Criekinge W, Conireras R. Beyaert R. Fiers W; Medline 1 96390831 A20, an inhibitor 
of cell death, self-associates by its zinc finger domain." FEBS Lett 1996:384.61-64 
30 [3] Song MY, Rothe M, Goeddei DV; Medline: 96270809 The tumor necrosis factor-mducible zinc finger protein 

A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation. Proc. Natl Acad Sci USA 1996,93 6721-6725 
[4] Opipari AW Jr, Boguski MS, Dixit VM: Medline: 90368626 The A20 cDNA induced by tumor necrosis factor 
alpha encodes a novei type of zinc finger protein." J Biol Chero 1990:265:14705-14708. 

35 [2208] 881. fzf-PARP) 

Poiy(ADP-ribose) polymerase zinc finger domain 

Cross-reference{s) PS00347, PARP_ZN_F1NGER_1 PS50084; PARP_ZN_FiNGER_2 

[2209] Poiy(ADP-ribose) polymerase {EC 2 4 2 30) {PARP} [1 2] is a eukaryotic enzyme that catalyzes the covalent 
attachment of ADP-ribose units from NAD(+) to various nuclear acceptor proteins. This post-transiationai modification 

40 of nuclear proteins is dependent on DMA. it appears to be involved in the regulation of various important cellular proc- 
esses such as differentiation, proliferation and tumor transformation as well as in the regulation of the molecular events 
involved in the recovery of the cell from DNA damage. Structurally, PARR about 1000 ammo-acids residues long, 
consists of three distinct domains, an N-terminal zinc-dependent DNA-binding domain, a central automodification do- 
main and a C-termmal NAD-binding domain The DNA-binding region contains a pair of zinc finger domains which 

45 have been shown to bind DNA in a zinc-dependent mariner The zinc finger domains of PARP seem to bind specifically 
to single-stranded DNA. DNA ligase 111 [3] contains, in its N-terminal section, a single copy of a zinc finger highly similar 
to those of PARP. 

[2210] Consensus patter it C-[KR}->-C->(3)-!-<-K-xf 3)-[RG]-xj 16.18}-W-[FYH]-H-x{2VC [The three C's and the hi are 
zinc ligands] Sequences known to belong to this class detected by the patternALL Other sequence's) detected in 
so SWISS-PROTNONE Sequences known to belong to this class detected by the profile ALL. Other sequence(s) detected 
in SWISS-PROTNONE. 

[2211] Note This documentation entry is linked to both signature patterns and a profile As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so 

ss [ 1] Althaus F.R., RiehterC.R. Mol. Bioi. Biochem. Biophys. 37,1-126(1987). 

[ 2] de Murcia G . Menissier de Murcia J. Trends Biochem. Sci. 19. 172-1 76(1 994). 

[ 3] Wei Y-F, Robins P, Carter K . Caldecott K . Pappin D J.C . Yu G -L. Wang R -P. Shell B K., Nash R.A . Schar 
P., Barnes D.E., Haseltine WA , Lindahl T. Mol. Cell. Biol. 15:3206-3216(1995). 
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^..Asparaginase 2 

[2212] Asparaginase- !! (L-asparagme arnmohydrolase li) is an extracellular protein that may be associated with the 
cell wall and whose expression is affected by the availability of nitrogen. Asparaginase i! catalyzes the reaction of L- 
s Asparagine + H 2 D = L-Aspartate + NH 3 , As many teukemias have high requirements for aspartic acid, asparaginase 
li proteins are useful as reagents for screening compounds for activity as leukemia chemotherapy products Aspara- 
ginase II protein can also be over- or under-expressed to after amino acid content in plant tissues or to modify nitrogen 
fixation and/or nitrogen metabolism in plants. 

[221 3] Ref Bon et al.fl 99? i Appl Biochem Biotechnol 63-65. 203-12 

10 

B,.Chloroa b-bind 

[2214] Chlorophyll a-b binding proteins are located in the thylakoid membranes ofthechloroplast and bind chlorophyll 
a and chlorophyll b : thereby triggering a chemical reaction (photosynthesis) These proteins are useful in controlling 
*s the rate, efficiency and/or output of photosynthesis. Overexpression of chlorophyll a-b binding proteins is expected to 
increase the rate of photosynthesis. 

Ref: Leutwiier et al. (1986) Nucleic Acids Res 14: 4051-84 Brandt et ai. (1992) Plant Mo! Biol 19: 699-703 

30 c-.PMf?.L.sy. n ?A a .ss 

[2215] DMRL Synthase (6.7-Dimethyl-8-Rrbityiiuma3;ine Synthase) catalyzes the last step in riboflavin (Vitamin 63} 
synthesis, condensing 5~amino-6-(1'-D)-ribityl-amino-2.4(1H, 3H)-Pyrirrtidinedione with L-3,4-Dihydroxy-2-Butanone 
4-Phosphate producing 67-Dimethyl-8-(1-D-Ribityl)Luminazine The enzyme forms a homopentamer Engineering of 
25 these proteins orthose with homologous sequences/structures may allow control of the amounts of vitamin 85 available 
in plants and/or accumulation of pigment, as well as altering reactions requiring hydrogen ion carriers/transmitters 
Ref: Garcia-Ramire? et al. (1995) J Biol Cham 270 23801-7 

D. E1 m N 

[2216] These proteins are ATP-dependent DMA heiicases that are required for initiation of viral DNA replication. They 
form a complex with the viral £2 protein. The E1-E2 complex binds to the replication origin that contains binding sites 
for both proteins The majority of sequences known fot this group of proteins are from various papillomaviruses, a type 
of double stranded DNA virus. In plants, the prototype double stranded DNA virus is Cauliflower Mosaic virus (CaMV), 
35 Manipulation of these proteins especially to produce variant proteins that form non-productive complexes, enables 
production of plants that are resistant to infection by double stranded EDNA viruses. 

Ref fang et al {1993} PNAS USA 90: 5086-90 

Ustav and Steniund (1991) EMBO J 10. 449-57 
40 Callaway et al (1996} Mo! Plant Microbe Interact 9. 810-8 

[2217] Elongation Pactor-1 is composed of four subunits: alpha, beta, delta and gamma Gamma subunits are pre- 
45 sumed to play a role in anchoring the complex to other cellular components. Studies of EF-1 genes in plants suggests 
that different forms of the EF-1 subunits may be expressed in particular organs or in response to stress Manipulation 
of the activity of these proteins, either by altered expression level or by structural mutation, may result in the accumu- 
lation of a particular protein in a chosen organ or allow production of particular proteins during stress conditions. 

so Ref: Kinzy et al. (1994) NAR 22: 2703-7 Dunn et al. (1993} Plant Mof Biol 23: 221-5 Aguilar et al. (1991 ) Plant Mo! 
Biol 17: 351-60 

F. E.N'V...poiyprotein 

55 [2218] This family comprises the envelope or coat proteins known from a number of different retroviruses in mam- 
malian species, retroviruses are responsible for diseases such as leukemia and HIV In plants, retroviruses are known 
in both monocot (e.g. Zeon-1 ) and dicot (e.g. Arabidopsis and tobacco) species and have been shown to induce mutant 
alleles at new loci Engineering of plant ENV proteins may allow mobilization or targeting of endogenous or introduced 
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retroviruses, in essence generating a new method for mutant production, gene tagging and the like 

Ref Marnoun etal (1990) J Virol 64' 4180-8 Grandbastien et a! (1989) Mature 337: 376-80 Wight and Voyfas (1998) 
Genetics 149: 703-15 

s 

[2219] Proteins having this domain (previously known as the giycosyl hydrolase family 5 domain} catalyze the en- 
ctohydrolysis of 1.4~0-D~giucosidic linkages in cellulose. Numerous plant proteins with this domain exist and are ex- 
10 pressed in an organ specific manner They are involved in the- fruit ripening process, in cell elongation and plant re- 
production. Modulation of the activity of these proteins, either by over- or under-expression or by mutation of the 
polypeptide, could be used to affect post-harvest physiology (e g rate of ripening) or for engineering reproductive 
sterility. 

*5 Ref: Giorda et al. (1990) Biochemistry 29; 7264-9 Tucker et al. (1988) Plant Physiol 88:1257-82 Shani et al. (1997) 
43: 837-42 Milligan and Gasser (1995) Plant Mo! Biol 28: 691-711 

^.Q.fycQsyiJiydr14 

so [2220] The j. J i-a myiases (family 14 of giycosyl hydrolases) catalyze the hydrolysis of 1.4-H-glucosidie linkages in 
polysaccharides and remove successive maltose units from the non-reducing ends of the chains Mutants of p-amylase 
in Arabidopsis exhibited altered degradation of starch throughout the diurnal cycle. In addition, the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metabolism/eatabolism. but also influence the amount of 
pigment stored within particular ceils Manipulation of the ji-amylase genes enables control of plant pigmentation (for 

25 example, fibre pigment in cotton) as well as carbohydrate synthesis and degradation. 

Ref: Zeernan et al. (1998) Plant J 15: 357-65 Hirano and Nakarnura (1997) Plant Physiol 114: 5675-82 Kitamoto et 
al. (1988) J Bacteriol 170: 5848-54 

3 0 j . G l y c o s yl , , h y d r 1 5 

[2221] Giycosyl hydrolases from family 15 {such as 1,4-Alpha-D-Glucan glucohydroiase.i catalyze the hydrolysis of 
terminal 1 4-linked aipha-D-glucose residues successively from the non-reducing ends of the chains resulting rn the 
release of (.-i-D-Glucose. In plants these pioteins have been tied to the mobilisation of the xylogltican stored in the 
35 cotyledonary ceil walls Proteins such as these could be \aned to jffect the rate of plant growth (for exjmpie during 
germination), storage and/or use of glucose and other sugars by plant tissues and alteration of the properties, such 
as elasticity, of plant cell walls. 

Ref Crombieetal 0&98}PlantJ 15 27-38 Hata et al (1991) Agrtc Biol Chem 55 941-9 

[2222] Members of the family 20 giycosyl hydrolases catalyze the hydrolysis of terminal non-reducing N-acetly-D- 
hexosamine residues in N-acetyl-jl-D-he^osammides N-acetyl-jJ-glucosaminidase belongs to this family and exists in 

45 several different forms (consisting of various combinations of alpha jnd beta chains) depending on the organism 
Family ?0 giycosyl hydrolases have been implicated in lysosomal storage diseases (such as Sandhoff disease) and 
glycogen storage disease in humans These types of proteins are also responsible for the hydrolysis of chrtm in plants, 
these proteins could be useful in controlling carbohydrate catabolism. theteby influencing the amount of sugais avail- 
able for storage and'or use in other metabolic pathways In addition, it is possible that such proteins could be used to 

so engineer an endogenous insect protection mechanism e g by secretion of a chitm-hydrolyzing composition by the 
plant, 

Ref: Graham et al (1988) J Biol Chem 263: 18823-9 O'Oowd et al. (1988) Biochemistry 27: 5216-28 
ss K.JlMG.box 

[2223] The HMG box is a novel type of DNA-bmding domain found in a diverse group of pioteins Numerous plant 
proteins contain this domain such as the HMG1/2-hke proteins The expression of some of these HMG proietns appears 
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to be regulated by arcadian rhythms and tn a light dependent manner, occurring at higher levels in roots, for example 
and lower levels in light-grown tissues such as cotyledons. Generally, HUG proteins are thought to influence transcrip- 
tion regulation. In plants, HMGs are believed to have a role in maintaining patterns, of circadian-regulakr-d expression 
for other genes, suggesting that these proteins could be exploited to control growth and development. 

Ret: Laudetet ai. (1963) Nucleic Acids Res 21: 2493-501 Zheng etal. {1993} Plant Mol Biol23: 813-23 Crasser et 
ai. (1993) Plant Mol Biol 23: 819-25 



[2224] ink:r!eukin-2 (IL-2)is produced in mammals by T ceils in response io antigenic or rnitogenic stimulation and 
is crucial for proper regulation and functioning of the immune response IL-2 is capable of stimulating B ceils, monocytes, 
lymphokine-activated killer cells, natural killer cells and glioma ceils Plant extracts have also been shown to stimulate 
the immune system (for example, mistletoe therapy tor human cancer), it is known that IL-2 is involved in feedback 
inhibition pathways that impact the inflammatory response as weli as the growth inhibition of tumor reactive T ceils. 
Plant proteins containing IL-2-like sequences are useful as immunity-based therapeutics, acting in a manner similar 
to IL-2 in mammals, 

Ref: Heike et si. (19&7i Scand J Immunol 45: 221-6 Ariel et a!. (1998) J Immunol 161: 2465-72 Schink {1997} An- 
ticancer Drugs 8 Suppl 1: S47-51 

M. Oxidored_FMN 

[2225] NADPH dehydrogenases catalyze the reaction NADPI-t + acceptor - NADPf +) + reduced acceptor One mem- 
ber of this family is yeast old yellow enzyme" (OYE) and is thought to be involved in oxylipin metabolism. A second 
yeast family member is a protein that binds estrogen binding protein (EBP) in addition to exhibiting oxidoreductase 
activity. An Arabidopsis hornolog to OYE has been described and estrogen binding proteins in plants have been re- 
ported. Plant proteins from this class have the potential io be used to modify lipid metabolism.'caiabolism. These pro- 
teins may also have use as therapeutics for breast and prostate cancer and other abnormal growth in steroid-sensitive 
tissues. 

Ref: Baker etal. (1998) Proc Soc Exp Bio! Med 217: 317-21 Schailer and Weiier {1997} J Biol Cnem 272: 28086-72 
Mandani et ai. (1994) PNAS USA 91: 922-6 

N. Oxidored_q_2 

[2226] The NADH-plastoquinone oxidoreductases catalyze the reaction NADH + plastoqutnone - NAD(+) + plasto- 
qumol in pi3nts these reactions occur in the c.hloropi3st 3nd are believed to participate in a chioroplast respiratory 
system Here, the NDH complex is postulated to act as a valve to remove excess reduction equivalents In the chloro- 
plasts. Manipulation of these proteins may improve the rate or efficiency of photosynthesis. 

Ref: Burrows et ai. (1998) EMBO J 17; 868-76 Kofer et al (1998} Mol Gen Genet 258; 168-73 Maier et al. (1995) J 
Mol Biol 251:614-28 

O. PABP 

[2227] Pol\ adenv late binding proteins bind the poly sA) tail ot mRNA Plants as exemplified bv Aiabidopsi-, contain 
numeious PABP genes that are expresseo in an organ-spe:itir mannw Fot example PA&P2 is fun:tion..il in roots and 
shoots, while PA8P5 is expressed predominantly in immature flowers. The PABP proteins are implicated in numerous 
aspects of posttranscnptronai regulation including mRNAturnoverandtranslational initiation. Control of activity of RABP 
protein* provides the ability k <.ontiol the expression ot vanoui. qen-a in particulai organs dunnu, development 

Ref: Hiison et ai (1993) Plant Physiol 103: 525-33 Beiostotsky and Meagher (1993) PNAS USA 90: 6686-90 

P. Parvo coat 

[2228] Par\ovrru->e-> are lin^ai stngl^-stiandfd PNA virupf? that are encapsulated by thief ^appid proteins Plants 
are susceptible k infection by ^inglt; skand^d CjN^ Miu^e 1 . ^uch Ah Mai^e streak viiu 1 . (MSv"l and vanous Geinirii 
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uruses The coat proteins tn tnese plant viruses are critical to the urus lrfec\de within the plant For sample the coat 
protein of MSV is thought to be involved in intra- and tnter-ceiiular movement within the plant. Engineering of proteins 
havtnq similantv to parvovirat coat proteins, especially to produce proteins that interfere with maturation of the virus 
particle, enables the production of plants having better resistance to natural plant single-stranded DN A viruses. 

s 

Ref I. in et n\ (W'iJ Gen Virol 76 i:*>r0 Rthde et al ( I'^tO > Uiologv 17 1 648-61 
O. Pkmase_C 

to [2229] Plant saline /threonine t. iolein Kinase's po^stssiiKi this, domain are e^pr^s^d in all ti^ue 1 . and an :i Known k 
undergo ssinnu-sp-Kjfk autophosphuiyiafji wind specifically phosphorybst* t\-> o nbusuma! proteins P14andP16 Dur- 
ing development these proteins predominate during high metabolic activity in growing buds, root tips, leaf margins 
and getminjtmg seeds The\ ate thought to he irudveo m the tontiol ot pl.jnt qioAth and development In addition 
two genes encoding pmtHns from this family have oeen des;:nbed that help plant cells aoar.t during :o!d or high salt 

»5 stresses. Consequently, engineering Pkmase C proteins provides a way to control general growth/development of the 
plant as well as a means to provide endogenous protection against environmental stresses, 

Ref Zhang et al (1Ct4i J Biol Cnem2GC i"C86~C2 Mcoguchi et al t 193D FEES Lett 3S6 WOO04 

20 R. REV 

[2230] Tne REV pioteins act post-tianscnptionally to reliexe negative repression of GAG ana £NV production in 
retroviruses such as Human Immounodeficiency Virus type I {HIV-1). Plants contain retrovirus-like viruses such as 
pararetiovinisfs md \<- trott^n*, porous (i e transpo^ons h King long termin i! repeats) Plant if trotr^n^posons in pat- 
2S ticular have been used to create mutations at various loci thereby permitting gene isolation gene tagging and the like 
Manipulation of plant R£V proteins enables control ot transposition frequencies of coi responding tiansposable ele- 
ments arid provides a new looi fw genetic erigineenn.3 of pianls. 

Ref Sodroski et ai i1086j Nature 32! 4 1,-7 Franchmi et al 1 193t ) PNAS USA 80 243o7 Marquet et al (1095j 
30 ikv24 CiKindba^ti^n *t al < f)8<rt Natuit 3 1 " 3/^-30 vVrighi ana voyhis . It398t «j« nt-iics 140 , 03- 1 b 

S. RuBtsCo small 

[2231] Ribukse t ^-hisphoiph.jte <.atho>vi3se,o>;\genrtse (RuBisCo) tatalwes the initial step in the photosyn- 
35 thtstic c;<ibon Etjouctton c^le adding (.arhon dio^idu to D-nbulose 1 S-bisphosphMe to form two mok- eu!> j s of 3-phos- 
pho-O-glycerate. RuBisCo is comprised of two subunits. one large which is synthesized in the chioropiast. and one 
simill Vvhich is '.ynthe'.i^'d in ttu : ' (.ylopiasm : md then tiansporttd in to the ^'tikiuplast The e^pr^sion cf the simill 
subumt of RuBisCo is light regulated. Manipulation of these proteins could increase the efficiency of photosynthesis 
or allow alterations m developmental timing. 

Ref Giuliano et al . !i)3fi> PNAo 1 ibA 8b /089-03 Dedond^i et al t.1993} Plant Phv^iol 101 801-fi 
T, Siaiyitransf 

45 [2232] Members of thu CMP-N-ac.Hsln.Hiraininat^-jvcialafto^amidf -c-2 Vsialyitransfera^e firmly ct<{t<\\z-* the fol- 
lowing reaction: 

CMP-N-aeetylneuraminate + p 1 -D-galactQSvi-l.3-N-aeetyl--a-D-g8laetos3minyl-R - CMP + a-N-acety Ineraminy 1-2,3- j3~ 
D-galactosyl-1 3-N-acetyi-alpha-D-galactf , sumim ( l-R These pi?tetns are though to oe responsible fot the synthesis of 
the sequence neurac-u-2 3-gal-|i~i 3-galnac- found on sugar chains Hmked to threonine or serine and also as a ter- 
se imnal sequence <.n certain gangliObides in mammalian In plants glycosyitiansfeias^s in the Golgi apparatus 
synthesize ceil wall polysaccharides and elaborate the complex glyoans of glycoproteins Engineering of plant sialyl- 
transferases allows targeting of proteins to particular cellular locations or enables the making of changes in cell wall 
structure. 

ss Ref: Wee et al. ( 1 998) Plant Cell 1 0: 1 759-68 Lee et al 1 1 994) J Bio! Chem 269 1 0028-3? Kitagawa and Paulson 
(1994) J Bio! Chem 269: 1394-401 
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U, .Signal 

[2233] Many plant proteins, in this, family contain sequences similar to those found in both components of the prokary- 
otic family of signal transducers known as the two-component systems This suggests that activation may require a 
s transfer of a phosphate group between the transmitter domain and the receiver domain. One family member in Arabi- 
dopsis appears to be involved in ethylene {a plant hormone) signal transduction. Other proteins in this family appear 
to be involved in the regulation of gene transcription under conditions of environmental stress. Signal proteins can be 
exploited to affect plant growth and development and/or control piant responses to stress conditions such as cold, 
nutrient availability, etc. 

10 

Ref, Chang et at. (1993) Science 262: 539-44 Nagaya et a!, (1993) Gene 131: 119-124 Gottfert et a!, (1990) PNAS 
USA 87: 2680-4 

V. .vMSA 

[2234] vMSA proteins, are major surface antigens presenting on the envelope of various retroviruses Surface anti- 
gens of retroviruses are often involved in tropism of the virus. Plants, contain retrovirus -li he viruses such as pararetro- 
viruses and retrotransposons {i.e. transposons having iong terminal repeats'). Plant retrotransposons in particular have 
been used to create mutants at various loci, thereby permitting gene isolation, gene tagging and the like Manipulation 
so of plant vMSA proteins enables control of troptsm of plant retroviruses that might be used for genetic engineering tools, 
thus enabling targeting of the virus to particular species and/or tissues of plants. 

Ref Okamoto et a I (1 988 > J Gen Virol 69: 2575-83 Grandbastien et al. f 1 989} Nature 337 376-80 Wright and V'oytas 
(1998) Genetics 149: 703-15 

W. 2f-CCCH 

[2235] This family of proteins is defined by having two CX(8)CX(5)CXi 3}H-type zmc finger domains These proteins 
cover a broad range of functions For example, the COPi protein acts as a repressor of photomorphogenesis in dark- 

30 ness; light stimuli abolish this suppressive action, in addition. COP1 protein can function as a negative transcriptional 
regulator capable ot direct interaction with components of the G-protein signaling pathway. As a second eyample, a 
zf-CCCH protein identified in Arabidopsis appears to be involved in the resistance to DNA damage induced by UV light 
and chemical DNA-damaging agents. Overexpression of this class of proteins permits production of plants that are 
better suited to adverse environments. Manipulation of expression of zf-CCCH proteins functioning as transcriptional 

35 regulators such as COP1 : enables manipulation of some signal transduction pathways. 

Ref Pang et al (1993) Nucleic Acids Res 21 1647-53 Deng et al (1992} Cell 71 ■ 791-801 
X. zf-RanBP 

[2236] Proteins falling within this category contain many X-X-F-G and X-F-X-F-G repeats, and may contain 
RANBP1-like orPPIase domains. Plant proteins having domains simiiarto these include PAS1 and GMSTi. PAS1 has 
been shown to have dramatic developmental afects that appear to be correlated with both cell division and ceii wall 
elongation. GMSTI has high identity to the yeast ST! stress-induoibie gene and has been shown to be heat inducible 
45 Proteins such as these may be useful for controlling growth and form of development. 

Ref: Vittorioso et al. (1998} Mol Ceii 810! 18: 3034-43 Hernandez Torres et al. (1995} 27. 1221-6 

[2237] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are located 
in the membranes of the endoplasmic reticulum. They function in NH 2 -terminal proteolytic processing, as shown for 
the yeast STE24 gene product. This gene is required for the correct processing ofu-factor. a yeast pheromone. Family 
M48 peptidases also appear to be required for some prenylation reactions, mediating COOH-terrninai CAAX process- 
es trig. Prenylation reactions are believed to be involved in the regulation of protein-protein and protein-membrane inter- 
actions. As an example, RAS GTPase activity is regulated in part by localization to the inner side of the plasma mem- 
brane upon prenylation. In plants, proteins from this family could be involved in pollen-stigma interactions such as 
those mediating self-pollenation vs. outcrossing, or could be members of several secondary metabolism pathways 
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Ref: Fujimura-Kamada et al. (1997) J Cel! Biol. 136: 271-85. Tarn et al. (1998) J Ceii Biol. 142: 635-49. 

s [2238j Tiie DNA po! Viral N domain is located at the N-terminal region of DNA polymerase isolated from several 
retrotd viruses such as the Cauliflower Mosaic Virus. The domain motif has also been found in numerous other species 
from humans to cyanobacteria. In these organisms, this motif seems to be associated with two types of sequences, 
retrotransposorts and mitochondria! genes. In the mitochondria! sequences this domain is potentially involved in the 
self-splicing conducted by group II mtrons. Various manipulations of this gene in plants allows control of the numerous 

10 retrotransposons endogenous to plant genomes or allows engineering of mitochondrial function, especially to increase 
efficiency of energy utilization by cells. 

REP. Chapdelame and Bonen (1 991 ) Cell 65 465-72 Perat and Miche i 1 993) Nature 364. 358-6 1 Wilson et al (1 994) 
366 32-8 Cambareri et al {1994) 242: 653-6? Gaardner et al. (1981) NAR 9. 2371-2868 Cummmgs et al. 
*5 (1990) Curr Genet 17: 375-402 Hattori et al. (1986) Nature 321: 625-8 

.^.QMlPAlOJ.^il'!?. 

[2239] This domain is found incalpastatm, an inhibitor protein specific for calpain. Caipatn is a non-lysosomal calcium- 
so dependent intracellular protease that appears to be involved in the dynamic changes of the cytoskeleton. especially 
actin-related structures, during early Drosophtla ernbryogenesis [1] Caipastatms co-exist in cells with calpains and the 
subcellular distribution of calpastatm is thought to be important to calpain regulation [2], in plants calpains and caip- 
astatms could be involved in ernbryogenesis and non-embryogenic organ reiteration. Mutations occurring in calpain 
inhibitor repeat domains would produce developmental abnormalities such as abnormal leaf root or flower develop- 
25 ment. 

[2240] Refs 

1 Ernori Y and Saigo K (1994) J Biol Chern 269: 25137-42. 

2 Mellgren RL, Lane RD. Mencle MT (1989) Biochim Biophys Acta 999: 71-77. 

[2241] Chorismate binding domains are present in plant anthranilate synthase (AS) genes AS genes catalyze the 
first step in the biosynthesis of tryptophan by converting chorismate and L-glutamineto anthranilate, pyruvate and L- 

35 glutamate Some of these genes are involved in feedback inhibition by tryptophan [1] while some are feedback insen- 
sitive [2). in Arabidopsis. two AS genes have overlapping, but different distributions. One of these AS genes is induced 
by wounding and bacterial pathogen infiltration [1]. Mutations in the chorismate binding domain would affect the pro- 
duction of tryptophan and could influence the plant's defense system. A3 gene products C3n be used form vitro syn- 
thesis of tryptophan and tryptophan derivatives. 

40 [2242] Refs 

1 Niyogi KK, Fink OR (1992) Plant Cell 4: 721-33. 

2 Song HS. C-Srotherton J f.-;, Gonzales RA. Wilholm JM f 19S8t Plant Physiol 117:533-43. 

[2243] Papillomaviruses are encapsulated double stranded DNA viruses. Plants are susceptible to infection by dou- 
ble stranded DNA viruses such as Cauliflower Mosaic, virus CCaMV). The coat proteins in these plant viruses are critical 
to the virus life cycle within the piant For example, the coat protein of CaMV is thought to be involved in intra- and 
so inter-cellular movement within the plant [1], Engineering of proteins having similarity to papillomavirus coat proteins 
may enable the production of plants having better resistance to natural plant double stranded DNA viruses. 
[2244] Refs 

1 Thompson SR, Melcher U (1993) J Gen Virol 74: 1141-8, 
55 Ad. Peptidase . M41 

[2245] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are integral 
membrane proteins They seem to be involved in the degradation of carbo>y-terminal-tagc;ed cytoplasmic proteins In 
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plantb these proteins ate located in the thyiakoid mernbtaneb of tne chloroplasts their expression ts light regulated 
and they are thought to be involved in degradation ot soluble stromal proteins and turn-over of thyfkoid proteins [1], 
Manic uLU ion of c > c tession and structutt; cf these pioleins would have effects on trn» efficiency of \ he k svnfhc sis arid 
the development of chloroplasts. 

! Lmdah! M ldh.t'? s Cseke I. Piohersky t- Anderson £■? Adam (1^6) J Biol Chern^/ ! .?<o:!9-'<4 
Ae. UPF0051 

w [2247] Thete is some eMdenoc th : it in clanfs proteins in this, faintly : ne involved in ATP synthesis in chlotoplasls 
[1 2] Mutations in thusu protons or attiring their e^it-ssion would all^ct the tjfficvrK y of photos ynthusis and tinurgv 
production. 
[2248] Refs 

*5 1 Kostrzewa M, 2etsche K (1992) J Mo! Bio! 227: 961-70. 

2 Kosti-ewi M 2>tsche K (VJi 3j Plant Mot Biol 23 rtwb 

Af. £7 

SO [2249] Papillomavirus* s ate encapsulated double stranded DMA mi uses The Papilkmaviius eaily eloign v iE~) is 
known as a r. :>tent immortalizing andttansformingagont Transformation by E7 is thought to bo mediated by the physical 
association of E7 with cellular proteins regulating entry into the cell cycle [1]. The result is entry into the cell cycle and 
suppression of terminal differentiation in mammalian cells. Thus, engineering of proteins having similarity to papillo- 
matous E~ piohitn tmabit^ the production uf plants having alt^r-Kf cvliul ir ptuliferation characteristics md possibly 

25 altered morphology. For example, overexpression of £7-hke proteins would be expected to result in proliferation of 
cells of the tissue m which the E7 protein is expressed, perhaps with suppression of differentiation events. Thus, for 
example overe<pf<-ssion of E7-like proteins m inc-tislc-ni cells can lesultm taller plants, and supctessioncf Isr-afing and' 
or flowering. 
[2250] Refs 

30 1 Zwcisehke W Jans*n-buir P Adv Cancel R^s 2000 iX \-29 
Ag, Peptidase U7 

[2251] This r. totem is known to te an inteo-tal memhtane ptotem in the ovanotwteiium Svnechocystis vsheie it 
35 functions to dtoust oltav^d signal pc-ptides [1] This activity is necessary to maintain proper secretion ot mature proteins 
across the membrane. In higher plants this protein may be present in the plastid or chioroplast membranes where it 
would function by enabling protein movement into and out of the chloroplasts. Mutations in this protein would be ex- 
pe<t^d to affect thr> dwclopm^nt of plastids including < hloiopl3sts or alt^r th^ eneigv tiansfei system within the 
cnioroolasts thereby aftecting growth and development 
40 [2252] Refs 

1 Kaneko f Sato S Koiam H lanakaA Asamizu t Nakamuta / Miyajima U Huosawa M bujiura tV! oasamoto S 
Mmura T Hosouchi T Matsuno A, Muraki A Nakazaki N Naruo K Okumura S Shunpo S TaKeuchi C Wada T 
Watanabe A /amada U rasuda M, 5'abata S (1996i DMA Rt-s 0* 109 id 

45 A -ictivtites of Polypeptides Comprising Sign i! Peptides 

[2253] Pol\ peptides comprising signal peptides are a famiK of proteins that aie typical l\ faigt-tt-d to 0 1 a partnulat 
otgunelle ot intruc-olltilar :ompartmont (2) infeiatt with a particular molecule or {3) foi societnn outside •jf a host :e!l 
Example of polypeptides comprising signal peptides include without limitation secreted proteins soluole proteins 
so receptors, proteins retained in the ER. etc. 

[2254] 'Ihese proteins compirsino, signal peptides aie useful to intdul.jte liqand Exceptor interactions cell-to-cell 
communication, signal transduction, intracellular communication, and activities and/or chemical cascades that take 
part m an organism outside or within of any particular cell. 

[2255] One class of such c loleins ate solut !<» proteins which ate ttansporttd cul of trn» oc II These pic U ins can ad 
ss is lig mds that bind to mruptof to inogt;i sicmal ti msduction or to poimit roinniunicaticn ti. j Uo j n evils 

[2256] Another class is receptor ptotetns v\hich also compuse a retention domain that lodges the receptoi protein in 
the membiane when the ceil tian« ports the iereptoi to the suif=r<r tf the cell Like th<r soluble lip^nds receptoi? ran 
also modulate siqnal transduction and communication between cells. 
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[2257] in addition the signal peptide itself can serve as a iigand for some receptors An example is the interaction of 
the ER targeting signal peptide with the signal recognition particle iSRP) Here, the SRP binds, to the signal peptide, 
hatting translation, and the resulting SRP eomple> then binds to docking proteins located on the surface of the ER. 
prompting transfer of the protein into the ER. 
s [2258j A description of signal peptide residue composition is described below in Subsection IV. C 1 

lii. Methods of Modulating Polypeptide Production 

[2259] It is contemplated that polynucleotides of the invention can be incorporated into a host cell or irwifro system 
10 to modulate polypeptide production For instance, the SDFs prepared as described herein can be used to prepare 
expression cassettes useful in a number of techniques for suppressing or enhancing expression. 
[2260] An example are polynucleotides comprising sequences to be transcribed, such as coding sequences, of the 
present invention can be inserted into nucleic acid constructs to modulate polypeptide production Typically, such se- 
quences to be transcribed are heterologous to at least one element of the nucleic acid construct to generate a chimeric 
f5 gene or construct. 

[2281] Another example of useful polynucleotides are nucleic acid molecules comprising regulatory sequences of 
the present invention. Chimeric, genes or constructs can be generated when the regulatory sequences of the invention 
imked to heterologous sequences in a vector construct Within the scope of invention are such chimeric gene and/or 
constructs. 

so [2262] Also within the scope of the invention are nucleic acid molecules, whereof at least a part or fragment of these 
DNA molecules are presented in REF AND SEQ TABLES 1 AND 2 of the present application, and wherein the coding 
sequence is under the control of its own promoter and/or its own regulatory elements. Such molecules are useful for 
transforming the genome of a host cell or an organism regenerated from said host cell for modulating polypeptide 
production. 

25 [2263] Additionally, a vector capable of producing the oligonucleotide can be inserted into the host cell to deliver the 
oligonucleotide. 

[2264] More detailed description of components to be included m vector constructs are described both above and 
below, 

[2265] Whether the chimeric vectors or native nucleic acids are utilized, such polynucleotides can be incorporated 
30 into a host cell to modulate polypeptide production Native genes and/or nucleic acid molecules can be effective when 
exogenous to the host ceil. 

[22663 Methods of modulating polypeptide expression includes, without limitation. 
Suppression methods, such as 

35 Antisense 
Ribozymes 
Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation, 

as well as Methods for Enhancing Production, such as 

Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

HI. A. Suppression 

[2267] Expression cassettes ot the invention can be used to suppress expression of endogenous genes which com- 
prise the SDF sequence, inhibiting expression can be useful, for instance, to tailor the ripening characteristics of a fruit 
so {Oelleretal.. Science 254:437 (1991)) or to influence seed size (WO98/07842) or to provoke cell ablation {Mariam et 
a!., Nature 357: 384-387 (1992), 

[22683 As described above, a number of methods can be used to inhibit gene expression in plants, such as antisense. 
ribozyme, introduction of exogenous genes into a host cell, insertion of a polynucleotide sequence into the coding 
sequence and/or the promoter of the endogenous gene of interest, and the like. 

SB 

ili.A.l Antisense 

[2289] An expression cassette as described above can be transformed into host cell or plantto produce an antisense 
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sttand of RNA For plant cells antisense RNA inhibits gene expression b) rjre\entmg tne accumulation of mRNA\»htch 
eiuodeb thf pnzyme of mtwest see ep, Sh*?eh> et al Pfnc War ^oa</ 60. U&A 85 8805 ^9SSi and Hiatt et =il 
U.S. Patent No. 4.801.340. 

s ili.A.2. Ribozymes 

[2270] Similarly rihozym- 1 constats f m transformed into 3 plant to cleave inRNA and down-tegul lit translation 
Hi. A3 Co-suppression 

[2271] Anothti method of snppiession it by mtrodnctng m -^ogenui^ ^opy of tht gt»rn* to be suppressed Introduc- 
tion of expi^^SK n cassette tn which 3 nui>ic acid is configuied in the s<=nb<; orientation *tth temped to th*> piomotei 
h.js been shtwn to pie^ ent the accumulation of mRNA A detailed description of this method is desuited above 

*s IH. A. 4. insertion of Sequences into the Gene to be Modulated 

[2272] ^ et ..mother me^m <. f suppiessmg gene expression is to insert a pt lynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. 

[227-3] Homologous lecombmation could be used to tai get a polynucleotide mserttoagene usingthet. le i.o\ s>stem 
20 1 A O Ve tgunst cA =il leucine Aadt Pes 26 2 V 29 {1^081 A r vtiguns.t tt a! Plant Mol B<& 18 29 H 1098 1 Albert 
eta!.. Plant J. 7:649 d 995)1 

[2274] In addition random insertion ot polynucleotides into a host ceil genome can also be used to disrupt the gene 
ofmtetfeSt Azpiioz-Le^han et a I 7/ wate m oep^f'.-s 1 (1E-t^ ? i in this method scteening. toi clones ftom .j librjrv 
containing tandem insertions is pruned fot identifying those that haw polynucleotides inserted intutht; gent of in- 
2s terest Sucn screening can be performed using prcoes and/or primers described above basea on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto The screening can also 
be perfumed b\ selecting done^ 01 anv transgenn [.lariis- hanng a desired phenot>}.e 

Hi.A.5. Regulatory Sequenced odu iation 

[2275] The SDFs des:nfced in REF ,.md SEQ TABLES 1 am 2 and fragments thereof are ewmpi^s of nu.ientides 
of the invention tnat contain regulatory sequences that can be used to suppress or inactivate transcription and 'or 
translation from a gene of interest as discussed in I.C.5. 

35 II I. A. 6. Genes Comprising Dominant-Negative Mutations 

[2276] Wh^n i,upf. tession ot production of the endogerK us native protein is desiied if is oft^n huff ful to t;vpiesi, a 
gen-* compnsing 3 dominant n^g^tiv^ mutation Pmotrtion of piotein variants ftoduced from oen-^s "ompnsinci dom- 
inant negative mutations is a useful tool for research Genes comprising dominant negative mutations can produce a 

40 variant polypeptide which is capable of competing with the native polypeptide, but which does not produce the native 
result. Consequently, over expression of genes comprising these mutations can titrate out an undesired activity of the 
native protein. For example, The product from a gene comprising a dominant negative mutation of a receptor can be 
used to constituttvely activate or suppress a signal transduction cascade, allowing examination of the phenotype and 
thus, the trait(s) controlled by that receptor and pathway. Alternatively, the protein arising from the gene comprising a 

45 dominant-negative mutation can be an inactive enzyme stii! capable of binding to the same substrate as the native 
protein and therefore competes with such natwe protein 

[2277] Products from genes comprising dominant-negative mutations can also act upon the native protein itself to 
prevent activity For eyample, the native protein may be active only as a homo-multimer or as one subunit of a hetero- 
multimer. Incorporation of an inactive subunit into the multimerwith native subunit(s) can inhibit activity. 
so [2278] Thus, gene function can be modulated in host cells of interest by insertion into these cells vector constructs 
comprising a gene comprising a dominant-negative mutation. 

lll.B. Enhanced Expression 

55 [2279] Enhanced expression of a gene of interest in a host cell can be accomplished by either (1) insertion of an 
exogenous gene; or <2j promoter modulation. 
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lli.B.1. insertion of an Exogenous Gene 

[2280] insertion of an e^pr^-sion consruKf enoedinq an ewxi^neus g^ne can boost tho number of geno eofj^ 
expressed in a host cell. 

s [2281j Such expression constructs can comprise genes that other encode the native protein that ts of interest or 
that encode .t vanantthat e>hibits enhanced activity .ts compared to the n.jtive p totem Sikh genes encoding piotems 
of tntcttiSt can be- constructed lioin the s^qu^nc.^ Horn REF AND SEQ TABLES 1 AND 2 frjgm.»nts, th^of and 
substantially similar sequence thereto. 

[2282] Such an exogenous gene can indude either a <. onstttuWe promoter permitting e<pression in any cell in a host 
to organism er a promoter that duetts transaction only in partn.u!ai colli, or tunes during a hesUelllifoeycleor in response 
to environmental stimuli. 

111. B.J. Regulatory Sequence Modulation 

*5 [22833 Tne SDFs of KEF and SEQ TABLES 1 &ND 2 and fragments thereof contain regulator sequences that can 
be used to enhance expression of a gene of interest. For example, some of these sequences contain useful enhancer 
elements in someiases duplication of enhance! element* onnsertion of exogenous enh .tncer elements will inci ease 
expression of a destied gene from a particular promoter As other samples all ii nromoteis require binding of a 
regulaton,' protein to be activated, while some promoters may need a protein that signals a promoter binding protein 

so to expe se a polymerase binding stro in e ithor ease ovir-pi eduction of such f. robins (.an be used 1o e nhanai expros- 
sion of a gene of interest bv mcreasinq the activation time of the promoter, 

[2284] Such regulatory proteins are encoded b) some of the sequences in REF AND SEQ TABLES i AND 2 frag- 
ments thereof, and substantially similar sequences thereto. 

[2285] Coding suquenee-s for thtst! piotems can be- constructed <2h de^enhud abo^e 
fV. Gene Constructs and Vector Construction 

[2286] To use isolate-d SDFs of the- present invention or a combination of the-m or parts and/or mutants and or fusions 
of said SDFb in tneabo\etechmaues lecomoinant DNAxectors wntcn comprise said SDFb and are suitable fortrans- 
30 iermaiion oi colli, such as phnf eells ate usually pupated the SDh consiruet can bt; made using standard rocom- 
binunt DMA ternniques (Sambrook et al 1989't and ;<jn be mtioduced to trV spe:tes <T interest 6y Agiobacfcr^tf- 1 - 
mediated transformation or by other means of transformation (e.g.. particle gun bombardment) as referenced below. 
[2287] Thevectoi backbone can Le an;> ofthose topical in thwart stkh =rs plasmids uiuses artificial chromosomes 
BAC s ^ ACi .tnd PACs and vectois of the sort described bv 

35 

<a)BAC Shizuya et al Pioc Natl Acad Set USA 89 8/I-M-8; <t7 (V?i2> Hamilton et al Pioc Natl Acad 8<.i 
LISA 93: 9975-9979 f 1996 V: 

( o)YAC Butk-^tal Science 236 £06-812 { I<>8~t 

ic> PAC Sternberg N et al Pioc Nat! Acad Sci USA Jan 87 ( n 10c-" i iOuOi 
40 t di Bacteria-Yeast Shuttie Vectors B^dshaw et =il Nucl A^ids Res 23 485U-4S5C ^19^5i 

(oi Lambda Phage Vectors Replacement vee tor e q Hi&chauf ti : il 1 MolBioll '0 ft'2, -842(.19fi3} or Inseifion 
vector, e.g.. Huynh et al., In: Glover NM ted 1 ! DMA Cloning: A practical Approach. Vol.1 Oxford: !RL Press 1 1985V 
(f ) T-DNA gene fusion vectors Waiden et al Mol Cell Biol 1 1 7t- 194 ( 1990 ) and 
ig) Piasmid vectors: Sambrook et a!., infra. 

45 

[2288] Typically a sector vm!I compnse the exogenous gene whicn in itb turn comprises an SDF of the present 
invention to be introduced into the genome of a host cell, and which gene may be an antisense construct, a nbozyme 
construct chimeroplast or a toding sequent with any desired tianscnptional am/oi tianslational r^guL^ry scienc- 
es such as oromoters UTPs and 3' end termination sequences Vectors of the invention can also include origins of 

so tephc^tion scaffold attachment regions i'SARs> maimers homologous sequences mtions eb 

[2289] A QNA sequence coding tor the desired polypeptide, tu e<ample a cDNA sequence encoding .t full length 
protein, will preferably be combined with transcriptional and transnational initiation regulatory sequences which will 
oirectthe transcription of the sequence from tne gene in the intenoed tissues of the transformed plant 
[2290] Foi example for os/er- expression a plant promoter tiagment may be employed that will diiect tiansaiption 

55 of thu aerie in all tissues ol a regenerated pi ml Alte-inalKelv the plant promoter may direct transcnption of an SDF of 
the indention in a specific tissue (tissuespecific promotetsi or may be otherwise under more piecise en yii on mental 
control (inducible promoters). 

[2291] It propyl polypeptide pioductienii, dosiied a polyadenylation region at the T-end of the coding region is ryp- 
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ically included. The polyadenylation region can be derived from the natural gene, from a variety of other plant gertes, 
or from T-DNA, 

[2292] The vector comprising the sequences from genes or SDF of the invention may comprise a marker gene that 
confers a selectable phenotype on plant cells. The vector can include promoter and coding sequence, for instance. 
s For example, the marker may encode biocide resistance, particularly antibiotic resistance, such as resistance to kan- 
amycin, G418. bleomycin, hygronrtycin, or herbicide resistance, such as resistance to chlorosulfuron orphosphinotricm. 

1 V. A. Cod ing Seguen pes 

to [2293] Generally, the sequence in the transformation vector and to be introduced into the genome of the- host ceil 
does not: need to be absolutely identical to an SDF of the present invention. Also, it is not necessary for it to be full 
length, relative to either the primary transcription product or fully processed mRNA Furthermore, the introduced se- 
quence need not have the same intron crexon pattern as a native gene. Also, heterologous non -coding segments can 
be incorporated into the coding sequence without changing the desired ammo acid sequence of the polypeptide to be 

»5 produced. 

IV. 8 . Promoters 

[2294] As explained abo^t- intioductng an e<og<?nous ^ DF trom the same species or an orthologous bPF from 
so anothei species can modulate the e<( tession of a native gene con expending to thai bDF of inteiest bueh an SDF 
construct tan oe under the contto! of nther a constitutive promoter oi a highly regulated rnducifcie pmmoter ie q a 
coppei inducible promote! i The ptomoterof inteiest can tnitially be either endogenous ot hetetologousto the soecies 
in question V\hen te -introduced into the genome ot^aid species such pamot^r becomes exogenous to sard species 
Ovet-expre^sion of an SDF transgene can lead to co-suppression or the homologous endogenous sequence thetebv 
25 creating some alterations tn the phenotypes of the transformed species as demonstrated by similar analysis of the 
chalcone synthase- gent- (Napoli et al riant Cvil 2 27i (1990i and ^an dei Kioi et ai Plant Cell <! <!91 ( 1 990 )> if an 
SDF is fc und to ent ode a protetn vmUi deniable chatadeitsiios its o^et-produdion can be controlled so that rls accu- 
mulation can be manipulated in m otaan- or tissue-identic manner utilizina a promoter having such specified, 
[2295] Li^esvise if the ore-motet of an SDF tor an SDF that includes a promoter! is found to be trssue-soecific or 
30 developmental!) regulated such a piomofer can be utilized to drive ct facilitate the ttansaipficn of a specific gene o! 
interest (p 9 s^ed storage ptotein <v mot-specific ptotein) Thus th^ IwH of accumulation of a particular ptotein tan 
be manipulated or its spatial localization tn an organ- or tissue- specific manner can be altered. 

35 

[2296] SDf-'s of the present invention containing signal peptides are indicated in the REH-' and SCO TABLES. In some 
cases it may be desirable for the protein encoded by an introduced exogenous or orthologous SDF to be targeted (1 1 
to a particular organelle intracellular compartment, (2) to interact with a particular molecule such as a membrane mol- 
ecule or (3) for secretion outside of the cell harboring the introduced SDF. This will be accomplished using a signal 
40 peptide. 

[2297] Signal peptides direct protein targeting, are involved in ligand-receptor interactions and act in cell to cell 
communication. Many proteins, especially soluble proteins, contain a signal peptide that targets the protein to one of 
several different intracellular compartments. In plants, these compartments include, but are not limited to. the endo- 
plasmic reticulum {ER}, mitochondria, plastidstsuc.h as c.hloroplasts), the vacuole, the Golgi apparatus, protein storage 

45 vessicles fPSV) and, in general, membranes Some signal peptide sequences are conserved, such as the Asn-Pro- 
lle-Arg ammo acid motrf found in the N-terminal propeptide signal that targets proteins to the vacuole (Marty (1999) 
The Plant Call 1 1 587-599} Other signal peptides do not have a consensus sequence perse, but are largely composed 
of hydrophobic, amino acids, such as those signal peptides targeting proteins to the ER (Vitaie and Denecke (1999) 
The Plant Cell 1 1 615-628) Still others do not appear to contain either a consensus sequence or an identified common 

so secondary sequence, for instance the chloroplast sttomal targeting signal peptides (Keegstra and Cline (1999) Th& 
Plant Cell 11: 55? -570). Furthermore, some targeting peptides are bipartite, directing proteins first to an organelle and 
then to a membrane within the organelle fe g within the thylakoid lumen of the chloroplast. see Keegstra and Cline 
(1999) The Plant Cell 11 : 557-570}. In addition to the diversity in sequence and secondary structure, placement of the 
signal peptide is also varied Proteins destined for the vacuole, for example, have targeting signal peptides found at 

55 the N-terminus. at the C-terrninus and at a surface location in mature, folded proteins Signal peptides also serve as 
ligands for some receptors. 

[2298] These characteristics of signal proteins can be used to more tightly control the phenotypic expression of 
introduced SDFs It! particular, associating the appropriate signal sequence with a specific SDF can allow sequestering 
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of the protein tn specific organelles fpiastids as an ^ ample > sectetion outside of the cell targeting interaction with 
parti<.ul=ii iet.ei.toii. etc H^rtc* 1 th<= inclusion <J signal pioWins in <.onstructb involving th<= SOFs of thf indention in- 
creases the range of manipulation ot SDF phenotypic e (predion Trn» iuk lee tide seque^e of the signal pe-pfide tan 
be isolated from :har,j;teic^T genes using jwrnon mHeculai fcntogi;<jl teihmqu^s oi :an he synthesized in i'itio 
s [2299] in addition the- native signal peptide sequences both ammo acid ano nucleotide described in the FEF and 
StiCi tables can be med to modulate pol\ peptide ttansport l-'-urthei van .tnts ot the native iignd peptides: desctibeo in 
tht REF and 3EQ tables are (.onte mpiated insertions delations or substitutions, can he made Such variants will 
retain at least one of the functions of the native signal peptide as wet! as exhibiting some degree of sequence identity 
to the native sequence. 

to [2300] Also fragments cf tht- sijn : ii (.eplide's of the imenlion ait useful and can be- fused with othei sinal peptide's 
ot mtetestio moduli transport uf a polypeptide 

V, Transformation Techniques 

'5 [2301] A wide range of techniques for inserting exogenous polynucleotides are Known for a number of host cells, 
including, without limitation, bacteria!, yeast, mammalian, insect and plant cells. 

[2302] Techniques for transforming a wide variety of higher plant species are well known and described in the tech- 
nical and scientific literature. See, e.g. Weismg et a!., Ann. Rev. Genet. 22:421 (1988); and Christou, Euphytica, v. 85. 
n. 1-3; 13-27. (1995). 

so [2303] DNA constructs of the invention may be introduced into the genome of the; desired plant host by a variety of 
conventional techniques. For eyampie, the DNA construct may be introduced directly into the genomic DNA ot the 
plant ceil using techniques such as electroporation and microinjection of piant ceil protoplasts, or the DNA constructs 
can be introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. Alternatively 
the DNA constructs may be combined with suitable TDNA flanking regions and introduced into a conventional Agro- 

25 bacterium tmvefaciens host vector. The virulence functions of the Agrobacterium tum$faci$n$ host will direct the in- 
sertion of the construct and adjacent marker into the piant cell DNA when the cell is infected by the bacteria (McCormac 
eta!,, Mol. Biotechnol. 8:199(1997); Hamilton, Gene 200:107 {1997}); Salomon etal. EMBOJ. 3:141 {1984); Herrera- 
Estreila et ai. BMBO J "2:987 ( 1 983), 

[2304] Microinjection techniques are known in the art and weii described in the scientific and patent literature. The 
30 introduction oi DMA constructs using polyethylene- glycol precipitation is described in Pas^kowski e-t al EMBO J 3 
2717 (1984). Electroporation techniques are described in Fromm et al Proc. Natl Acad. Sci USA 82 5824 (1965} 
Ballistic transformation techniques are described in Klein et al. Nature 327:773 (1987). Agro-bacterium tumefaciens 
mediated transformation techniques, including disarming and use of binary or cointegrate vectors, are well described 
in the scientific literature. See, for example Hamilton, CM.. Gene 200:107 (1997); Mulleret ai. Mot. Gen. Genet 207; 
35 171(1 987), Komari et al. Plant J. 10: 165 (1 996); Venkateswartu et al. Biotechnology 9:11 03 (1991) and Gleave, AP. , 
Plant Moi Biol. 20:1203 (1&92): Graves and Goldman. Plant Mo/ Biol. 7:34 < 1936) and Gould et ai., Plant Physiology 
95:428 (1991). 

[2305] Transformed plant cells which are derived by any of the above transformation techniques can be cultured to 
regenerate a whole plant that possesses the transformed genotype and thus the desired phenotype such as seedless- 

40 ness. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, 
typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide 
sequences. Piant regeneration from cultured protoplasts is described in Evans etal . Protoplasts Isolation and Culture 
in Handbook of Plant Cell Culture," pp. 124176, MacMilian Publishing Company, New York. 1983; and Binding, Re- 
generation of Plants. Plant Protoplasts, pp. 2173. ORG Press, Boca Raton. 1988. Regeneration can aiso be obtained 

■*s from plant callus, explanis : organs, or parts thereof. Such regeneration techniques are described generally in Klee et 
al Ann Rev of Plant Pnys 38467 (1987). Regeneration of monocots (rice) is described by Hosoyama et al (Biosci 
Biotechnol Bioohem 58:1500 (1994)) and by Ghosh et a I {J Biotechnol 32 ;1 (1994}) The nucleic acids of the inven- 
tion can be used to confer desired traits on essentially any piant 

[2306] Thus, the invention has use over a broad range of plants, including species from the genera Anacartiium. 

so Arachis, Asparagus. Atropa, Avena, Brassies, Citrus. Citrullus, Capsicum. Carthamus. Cocos, Coffea. Cucumis. Cu- 
curbit a. Daucus. Blaeis, F-'ragaria. Glycine. Gossypium. Heiianthus. Heterocaliis. Hordeum, Hyoscyamus. Laciuca, 
Linum. Lolium.Lupmus. Lycopersicon. Mains. Manihot. Majorana. Medicago. Nicotians, O.'es. Oryza, Par.ieum. Pan- 
nesetum, Persea. Phaseolus. Pistachio. Pisum. Pyrus. Primus, fkaphanus, Ricmus. Secaie, Senecio, $tnapi$ : Sola- 
num. Sorghum, Theobromus, Trigonella, Trittcum. Vicia, Wts, Vtgna. and, Zea. 

55 [2307] One of skill will recognize that after the expression cassette is stabiy incorporated in transgenic plants and 
confirmed to be operable, it can be introduced into other plants by sexual crossing Any of a number of standard 
breeding techniques can be used, depending upon the species to be crossed. 

[2308] The particular sequences of SDFs identified are provided in the attached REF AND SEQ TABLES 1 AND 2 
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One of ordinary skill in the art, having this data, can obtain cloned DNA fragments, synthetic DNA fragments or polypep- 
tides constituting desired sequences by recombinant methodology known in the art or described herein. 

EXAMPLES 

s 

[2309] 'The invention is. illustrated byway of the following example?.. The invention is. not limited by these examples 
as the scope of the invention is defined solely by the claims following. 

10 

[2310] A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA from corn plants 
grown from HYBRID SBtvD # 36A19, purohas.ed from Pioneer Hi-Bred international, inc., Supply Management. P.O. 
Box 256, Johnston, Iowa 50131-0256. 

»5 |2311] A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from Arabidopsis tftatiana. Wassilews- 
hija eootype or by sequencing cDNA obtained from mRNA from such plants, as described below This, is a true breeding 
strain. Seeds of the plant are available from the Arabidopsis Biological Resource Center at the Ohio State University 
under the accession number CS2360 Seeds of this plant were deposited under the terms and conditions of the Bu- 

so dapest Treaty at the American Type Culture Collection. Manassas. VA on August 31 , 1999, and were assigned ATCC 
No. PTA-595. 

[2312] Other methods for cloning full-length cDNAare described, for example, by Seki etal . Plant Journal 15.707-720 
(1998) High-efficiency cloning of Arabidopsis fuii-length cDNA by biotinylated Cap trapper"; Maruyama et at, Gene 
138:171 f1S94) Oiigo-capping a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucie- 

25 oti'des": and WO 98/34981 . 

[2313] Tissues were, or each organ was. individually pulverised and frozen in liquid nitrogen. Next, the samples were 
homogenized in the presence of detergents, and then centnfuged The debris and nuclei were removed ffom (he sample 
and more detergents were added to the sample. The sample was centnfuged and the debhs was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by treatment with detergents 

30 and proteinase K followed by ethane! precipitation and centnfugation The poiysornai RNA irorn the different tissues 
was pooled according to the following mass ratios 15/15/1 for male inflorescences, female inflorescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods described below; 
[2314] Starting materia! for cDNA synthesis for the exemplary corn cDNA clones with sequences presented in REF 
AND SEQ TABLES 1 AND 2 was polyi A)-contatning poiysornai mRNAs from inflorescences and loot tissues of com 

35 plants grown from HYBRID SEED # 35A19 Male inflorescences and female (pre-and post-fertilization) inflorescences 
were isolated at various stages of development. Selection for poly(A) containing poiysornai RNA was done using oltgo 
d(T) cellulose columns, as described by Cox and Goldberg, Plant Molecular Biology A Practical Approach" pp 1-35. 
Shaw ed c. 1988 by iRL. Oxford The quality and the integrity of the polyA-t RNAs weie evaluated 
[2315] Starting material for cDN A synthesis for the exemplary Arabiaopsis cDNA clones with sequences presented 

40 m REF AND SEQ TABLES 1 AND 2 was poiysornai RNA isolated from the top-most inflorescence tissues oi Arabidopsis 
lhaliana Wassilewskrja (Ws ) and from roofs of Arabidopsis lhaliatia Landsberg erecta i L er ) also obtained Irorn the 
Arabidopsis Biological Resource Center. Nine parts inflorescence to every part root was used, as measured by wet 
mass Tissue was pulverised and exposed to liquid nitrogen Next, the sample was homogenised in the presence of 
detetgents and then centrifuged The debris, and nuclei were temc^ed from the sample and more detergents were 

45 added to the sample The sample was centnfuged and the debris was removed and the sample was applied to a 2M 
suciose cushion to isolate poiysornai RNA Cov et al . Plant Molecular Biology A Practical Appioach". pp 1-35. Shaw 
ed.. c 1888 by IRL. Oxford. The poiysornai RNA was used for cDNA synthesis by the methods described below. 
Poiysornai mRNA was then isolated as described abo^e foi corn cDNA The quality of the RNA was assessed elec- 
trophoretically. 

so [2316] Following preparation of the mRNAs fiom various tissues as described above, selection of mRNA with intact 
5' ends and specific attachment of an oligonucleotide tag to the end of such mRNA was performed using either a 
chemical or enzymatic appioach. Both techniques take advantage of the piesence of the cap" stiucture. which char- 
acterises the 5' end of most intact mRNAs and which comprises a guanosme generally methylated once, at the 7 
position. 

55 [2317] The chemical modification approach involves the optional elimination of the 2', 3'-cis dio! of the 3' terminal 
ribose. the oxidation of the 2' 3'~cis did of the nbose linked to the cap of the 5' ends of the mRNAs into a dialdehyde, 
and the coupling of the such obtained dialdehyde to a derivateed oligonucleotide tag Further detail regarding the 
chemical approaches for obtaining mRNAs having intact 5' ends ate disclosed in International Application No 
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W098/34981 published November 7,1996. 

[2318] The enzymatic approach for Itgating the oligonucleotide tag to the intact 5' ends of mRNAs involves the removal 
of the phosphate groups present on the 5' ends of uncapped incomplete rnRNAs. the subsequent decapping of mRNAs 
having intact 5' ends and the ligation of the phosphate present at the 5' end of the decapped mRNA to an oligonucleotide 
s tag Further detail regarding the enzymatic approaches for obtaining mRNAs having intact 5' ends are disclosed in 
Dumas Milne F:dwards J B. (Doctoral Thesis of Paris VI University, l.e c.ionage des ADNc complete: diffiouites et per- 
spectives nouvelles. Apports pour i'etude de la regulation de ['expression de la tryptophane hydroxylase de rat, 20 
Dec. 1393), EPO 625572 and Kato et al.. Gene 150 243-250 (1994). 

[2319] In both the chemical and the enzymatic approach, the oligonucleotide tag has a restriction enrryme site (e.g 
10 an EcoRi site) therein to facilitate later cloning procedures Following attachment of the oligonucleotide teg to the 
mRNA. the integrity of the mRNA is examined by performing a Northern blot using a probe complementary to the 
oligonucleotide tag. 

[2320] For the mRNAs joined to oligonucleotide tags using either the chemical or the enzymatic method, first strand 
cDNA synthesis is performed using an oligo-dT primer with reverse transcriptase. This oligo-dT primer can contain an 
*s internal tag of at least 4 nucleotides, which can be different from one mRNA preparation to another. Methylated dCTP 
is used for cDNA first strand synthesis to protect the internal EcoRi sites from digestion during subsequent steps. The 
first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline hydrolysis to eliminate residual 
primers. 

[2321] Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow fragment and a primer 
so corresponding to the 5' end of the ligated oligonucleotide The primer is typically 20-25 bases in length Methylated 
dCTP is used for second strand synthesis in order to protect internal EcoRi sites in the cDNA from digestion during 
the cloning process. 

[2322] Following second strand synthesis, the full-length cDNAs are cloned into a phagemid vector, such as pBlue- 
Scripl™ iStratagene) The ends of the full-length cDNAs are blunted with T4 DMA polymerase (Biolabs) and the cDNA 
25 is digested with EcoRi. Since methylated dCTP is used during cDNA synthesis, the EcoRi site present in the tag is the 
only hemt-methylated site; hence the only site susceptible to EcoRi digestion. In some instances, to facilitate subclon- 
ing r an Hind 111 adapter is added to the 3' end of full-length cDNAs 

[2323] The full-length cDNAs are then size fractionated using either exclusion chromatography {AcA : Biosepra) or 
electrophoretic separation which yields 3 to 8 different fractions. The full-length cDNAs are then directionally cloned 

30 either into pBlueScnpt™ using either the EcoRi and Smal restriction sites or, when the Hind 111 adapter is present in 
the full-length cDNAs. the EcoRi and Hind 111 restriction sites. The ligation mixture is transformed, preferably by elec- 
troporation, into bacteria, which are then propagated under appropriate antibiotic selection. 
[2324] Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as follows. 
[2325] The plasmid c.DNA libraries made as described above are purified (e g. by a column available from Qiagen}. 

35 A positive selection of the tagged clones is performed as follows. Briefly, in this selection procedure, the plasmid DMA 
is converted to single stranded DNA using phage f-'t gene 11 endonuclease in combination with an exonuclease f Chang 
et al , Gene 127:95 (1993)} such as e<onuelease 111 or T7 gene 6 exonuclease The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et al . Biotachniques 13 124 (1992). Here the single 
stranded DNA is hybridized with a biotinylated oligonucleotide having a sequence corresponding to the 3' end of the 

40 oligonucleotide tag Preferably, the primer has a length of 20-25 bases Clones including a sequence complementary 
to the biotinylated oligonucleotide are selected by incubation with streptavidm coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the magnetic beads and 
converted into double stranded DNA using a DNA polymerase such as ThennoSequenase™ (obtained from Amersham 
Pharmacia Biotech). Alternatively, protocols such as the Gene Trapper™ kit (Gibco BRI.} can be used. The double 

■*s stranded DNA is then transformed, preferably by electroporation, into bacteria. The percentage of positive clones 
having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot analysis. 
[2326] Following transformation, the libraries are ordered in microtiter plates and sequenced TheArsbidopsin library 
was deposited at the American Type Culture Collection on January 7, 2000 as E-coli liba 010600" under the accession 
number PTA-11 61. 

[2327| The SDFs of the invention can be used in Southern hybridizations as described above. The following describes 
extraction of DNA from nuclei of plant ceils, digestion of the nuclear DNA and separation by length transfer of the 
55 separated fragments to membranes, prepatation of probes for hybridisation, hybridization and detection of the hybrid- 
ized probe. 

[2328] The piooedures described herein can be used to isolate lelated polynucleotides or for diagnostic purposes 
Moderate stringency hybridization conditions as defined above are desctib^d in the present <*< ample These condi- 
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tions result in detection of hybridization between sequences having at least 70% sequence identity. As described above, 
the hybridization and wash conditions can be changed to reflect the desired percentage of sequence identity between 
probe and target sequences that: can be detected. 

[2329] in the following procedure, a probe tor hybridization is produced from two PGR reactions using two primers 
from genomic sequence of Arabidopsts thaiiana As described above, the- particular template for generating the probe 
can be any desired template. 

[2330] The first PGR product is assessed to validate the size of the primer to assure it is of the expected size. Then 
the product of the first PGR is used as a template, with the same pair of primers used in the first PGR, in a second 
PGR that produces a labeled product used as the probe. 

[2331] Fragments detected by hybridisation, or other bands of interest, can be isolated from geis used to separate 
genomic DNA fragments by known methods for further purification and/or characterization. 

Buffers for nuciear DMA extraction 

(2332] 

1 10XHB 





1000 mi 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma S-2501't 


1 0 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA (disodium) 


3/.2g 


EDTA. inhibits nuclease 


:■ ■ \; v- , 


■ I: ' ;: 


Buffer 


0.8 M KG! 


59.6 g 


Adjusts ionic strength for stability of nuclei 


Adjust pH to 9.5 with 10 N NaQ 
to inactivate this nuclease. 


H. It appears that there is a nuclease present in leaves. Use of pH 9.5 appears 



2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50" C Add the sucrose slowly then bring the mixture to close 
to final volume, stir constantly until it has dissolved. Bring the solution to volume 



3. Sarkosyi solution (lyses nuclear membranes') 





1000 mi 


N-lauroyl sarc.osine (Sarkosyl) 
0. 1 M Tris 

0.04 M EDTA fDisodium) 


:■■ ■" ■: 

12.1 g 
14,9 g 


Adjust the pH to 9.5 after ali the components are dissolved and bring up to the proper volume 



4. 20% Triton X-100 

SO mi Triton X-100 

320 ml 1xHB iw/o p-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 



[2333] 

1. Prepare 1XH" buffer (keep ice-cold duimg u 
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(continued) 





1000 mf 


Water 


634 ml 


Added just before use: 




100 mM PMSF* 
fi-mercaptoethanol 


10 ml a protease inhibitor; protects nuclear membrane proteins 
1 ml inactivates nuclease by reducing disulfide bonds 



MOO mM PMSF 

(phenyl melhyt stiifonyt fluoride. Sigma P-7626} 
(add 0,0875 g to 5 mf 100% efhanot) 



2 Homogenize the tissue in a blender (use 300-400 ml of 1xHB per blonder). Be sure that you use 5-10 ml of HB 
i5 buffer per gram of tissue. Blenders generate heat so be sure to keep the homogenate cold. It is necessary to put 

the blenders in ice periodically, 

3. Add the 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice for 20 mm. This lyses plastid, 
but not nuclear, membranes. 

4. Filter the tissue suspension through several nylon filters into an ice-cold beaker. The first filtration is through a 
250-micron membrane; the second is through an 85-micron membrane; the third is through a 50-micron membrane; 
and the fourth is through a 20-micron membrane. Use a large funnel to hold the filters. Filtration can be sped up 
by gently squeezing the liquid through the filters. 

5. Centrifuge the filtrate at 1200 x g for 20 min. at 4< : C to pellet the nuclei. 

8 Discard the dark green supernatant. The pellet will have several layers to it. One is starch; it is white and gritty. 
The nuclei are gray and soft. In the early steps, there may be a dark green and somewhat viscous layer of chio- 
30 ropiasts. 

Wash the peiiets in about 25 ml cold H buffer {with Triton X-100) and resuspend by swirling gently and pipetting. 
After the pellets are resuspended. 

35 Pellet the nuclei again at 1200 - 1 300 x g. Discard the supernatant. 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a pale green. This usually 
happens after 3 or 4 re suspensions At this point, the pellet is typically grayish white and very slippery The 
Triton X-100 in these repeated steps helps to destroy the chloroplasts and mitochondria that contaminate the 
40 prep. Resuspend the nuclei for a final time in a total of 15 ml of H buffer and transfer the suspension to a sterile 

125 ml E-jrtenmeyer flask. 

7. Add 15 ml, dropwise, cold 2% Sarkosyf, 0.1 M Tris, 0.04 M EDTA solution (pH 9.5) while swirling gently. This 
lyses the nuclei. The solution will become very viscous. 

45 

8. Add 30 grams of CsC! and gently swirl at room temperature until the CsCI is in solution. The mixture will be gray 
white and viscous. 

9. Centrifuge the solution at 11 ,400x g at 4"C for at least 30 min. The longerthisspin is, the firmer the protein pellicie. 

10. The result is typically a clear green supernatant over a white pellet, and (perhaps) under a protein pellicie. 
Carefully remove the solution under the protein pellicle and above the pellet. Determine the density of the solution 
by weighing 1 mi of solution and add CsCI if necessary to bring to 1.57 g/ml The solution contains dissolved solids 
(sucrose etc) and the refractive index alone will not be an accurate guide to CsCI concentration 

ss 

11. Add 20 Ml of 10 mg/mi EtBr per ml of solution. 
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12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 

13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer pipette- and discard. 
Carefully remove the DNA band with another transfer pipette The DNA band is usually visible in room light; oth- 

s erwise. use a long wave UV light to locate the band 

14 Extract the ethidium bromide: with isopropano! saturated with water and salt. Once the solution is clear, extract 
at least two more times to ensure that all of the EtBr is gone. Be very gentle, as it is very easy to shear the DNA 
at this step. This extraction may take a while because the DNA solution tends to be very viscous, if the solution is 
to too viscous, dilute it with TE. 

15. Dialyze the DNA for at least too days against several changes (at least three times} of TE (10 mM Tns, 1mM 
EvDTA, pH 8) to remove the cesium chloride 

'5 16 Remove the dialyzed DNA from the tubing. If the diaiyzed DNA solution contains a lot of debris, centrifuge the 

DNA solution at least at 2500 x g for 10 min. and carefully transfer the clear supernatant to a new tube. Read the 
A.280 concentration of the DNA, 

17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the DNA. Load 50 ng and 
so 100 ng (based on the- OD reading; and compare- it with known and good quality DNA. Undigested lambda DNA 

and a lambda-Hind Ill-digested DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

25 Protocol: 
[2334] 

1 The relative amounts of DNA for different crop plants that provide approximately a balanced number of genome 
30 equivalent is given in Table 3 Note that due to the sine oi the wheat genome, wheat DNA will be undo;! represented. 

Lambda DNA provides a useful control for complete digestion. 

2. Precipitate the DNA by adding 3 volumes of 100% ethanoi. Incubate at -20*0 for at least two hours. Yeast DNA 
can be purchased and made up at the necessary concentration, therefore no precipitation is necessary for yeast 
35 DNA. 

3 Centrifuge the solution at 11 .400 >: g for 20 min Decant the ethanoi carefully i be careful not to disturb the pellet) 
Be sure that the residual eth3nol is completely removed either by vacuum desiccation or by carefully wiping the 
sides of the tubes with a clean tissue. 

4 Resuspend the pellet in an appropriate volume of water Be sure the pellet is fully resuspended before proceeding 
to the next step. This may take about 30 min. 

5 Add the appropriate volume of 1QX, reaction buffer provided by the manufacturer of the restriction enzyme to 
45 the resuspended DNA followed by the appropriate volume of enzymes. Be sure to mix it properly by slowly swirling 

the tubes. 

6. Set-up the iambda digestion-control for each DNA that you are digesting. 

so 7 Incubate both the experimental and lambda digests overnight at 37 ! C. Spin down condensation in a mierofuge 

before proceeding. 

8. After digestion, add 2 j.t! of loading dye (typically 0.25% bromophenol blue, 0.25% xylene cyanol in 15% Ficoll 
or 30% glycerol) to the lambda-contiol digests and load in 1% TPE-agarose gel (TPE is 90 mtVt Tris-phosphate, 2 

55 rriM EDTA, pH 8). If the iambda DNA in the lambda control digests are completely digested, proceed with the 

precipitation of the genomic DNA in the digests 

9. Precipitate the digested DNA by adding 3 volumes of 100% ethanoi and incubating in -20°C for at least 2 hours 
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(preferably overnight). 

EXCEPTION: Arahtdopsts and yeast DNA. are digested In an appropriate volume, they don't have to be precipitated 

10. Resuspendthe DNA in an appropriate volume of TE fe.g , 22j.il x 50 Wots = 1100ul)and an appropriate volume 
s of 10X loading dye (e.g , 2.4 ply 50 blots = 120 ul). Be careful in pipetting the loading dye - it is viscous Be sure 

you are pipetting the correct volume. 



Table 3 



Some guide points in digesting genomic DNA. 


Species 


Genome Size 


Size Rei3tive to 
Arafoidopsis 


Genome Equivalent to 
2 ug Arabidopsis DNA 


Amount of DNA per 
bfot 


Arabidopsis 


120 Mb 


1X 


1X 




Brass ica 


1,100 Mb 


9 2X 


0 54X 


. 


Com 


2,800 Mb 


23. 3X 


0.43X 


20 jig 


Cotton 


2,300 Mb 


19 2X 


0.52X 


20 M g 


Oat 


11,300 Mb 


94X 


0.11X 


20 ug 


Rice 


400 Mb 


3.3X 


0.7SX 


5ng 


Soybean 


1,100 Mb 


9 2X 


0..54X 


10 ug 


Sugarbeet 


758 Mb 


8 3X 


0BK 


10 ug 


Sweetclover 


1,100 Mb 


9 2X 


0.54X 


10 ug 


Wheat 


16,000 Mb 


133X 


0.08X 


20 ug 


Yeast 


15 Mb 


0.1 2X 


1X 


0.25 ug 



Protocol for Southern Blot Analysis 



[2335] The digested DNA samples are eiectrophoresed in 1% agarose gels in Iv TPCv buffer Low voltage, overnight 
separations are preferred. The gels are stained with HBr and photographed. 

1 For blotting the gels, first incubate the gel in 0.25 N HCI (with gentle shaking) for about 15 min 

2 Then briefly rinse with water The DNA is denatured by 2 incubations. Incubate (with shaking) in 0 5 M NaOH 
in 1.5 M NaClfor15min. 

3. The gel is then briefly rinsed in water and neutralized by incubating twice (with shaking) in 1.5 M Tris pH 7 5 in 
1.5 M NaCifor15 min. 



4 A nylon membrane is prepared by soaking it in water for at least 5 mm. then in 6X SSC for at least 15 mm. 
before use. (20x SSC is 175.3 g NaCi. 88.2 g sodium citrate per liter, adjusted to pH 7.0.) 

5. The nylon membrane is placed on top of the gel and all bubbles in between are removed. The DNA is blotted 
from the gel to the membrane using an absorbent medium, such as paper toweling and 6x SCC buffer After the 
transfer, the membrane may be lightly brushed with a gloved hand to remove any agarose sticking to the surface. 

8 The DNA ts then fixed to the membrane by UV crosslinking and baking at 80"C. T he membrane is stored at 4"C 
until use. 



ss 
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B. Protocol for PC R Ampi ifjcation of Genomic Fragments in Arafatdopsis 
Amplification procedures: 
s [2336] 

1. Mix the: following in a 0 20 ml PCR tube or 96-wel! PGR piate 



Volume 


Stock__ 


Final Amount or Cone. 


0.5 u-l 


-10 ng/pi genomic DNA 1 


5 ng 


2.5 ui 


1 0X PCR buffer 


20 mM Tris, 50 mM KCI 


0.75 u! 


50 mM MgCI, 


1.5 mM 


1 ul 


10 pmol/u.i Primer 1 {Forward} 


10 pmol 


1ui 


10 pmol/ul Primer 2 (Reverse) 


10 pmol 


0.5 m( 


5 mM dNTPs 


0.1 mM 


0.1 ul 


5 units/ul Platinum Taq" (Life Technologies, Gaithersburg, MD} DNA 
Polymerase 


■ 


(to 25 ui) 


Water 





2 The template DNA is amplified using a Perkin Elmer 9700 PCR machine: 



35 



1) 94° C for 10 mm followed by 


2} 


3) 


4) 


5 cycles: 


5 cycles. 


25 cycles: 


94 "C - 30 see 
62 "C - 30 sec 
72 *C - 3 min 


94 *C - 30 sec 
58 -C ■ 30 sec 
72 "C - 3 min 


94 - C - 30 sec 
53 "C - 30 sec 
72 - 3 min 


5} 72"C for 7 min Then the reactions are stopped by chilling to 4 e C. 



[2337] The procedure can ho adapted to a multi-well format if n&e&ssary. 

Quantification and Dilution of PCR Products: 

[2338] 

1. The product of the PCR is analyzed by electrophoresis in a 1% agarose gel A linearized piasmid DNA can be 
used as a quantification standard (usually at 50, 100, 200, and 400 ng). These will be used as references to 
approximate the amount of PCR products Hindi il-digested Lambda DNA is useful as a molecular weight marker. 
The gel can be run fairly quickly, e g.. at 100 yoits The standard gel is examined to determine that the size of the 
PCR products is consistent with the expected size and if there are significant e*tra bands or smeary products in 
the PCR reactions. 

2. The amounts of PCR products can be estimated on the basis of the piasmid standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of DNA from bands with the 
correct size can be isolated by dipping a sterile 10-p! tip into the band while viewing though a UV Transilluminator. 
The small amount of agarose gei (with the DMA fragment) is used in the labeling fraction 
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C, Protocol for PCR-DIG-Labelsrtg of ONA 

Solutions: 
s [2339] 

Reagents in PCR reactions (diluted PGR products, 10X PGR Buffer, 50 mM MgCI 2 . 5 U/ul Platinum Taq Polymerase, 
and the primers) 

10 10X dNTP + DlG-11-dUTP [1 :5] (2 mM dATP. 2 rnM dCTP. 2 rnM dGTP, 1 65 mM dTTP, 0 35 mM DIG-11-dUTP) 

10X dNTP + DIG-11-dUTP [1:10] (2 mM dATP. 2 mM dGTP. 2 mM dGTP. 1.81 mM dTTR 0 19 mM DIG-11-dUTP) 
1 0X dNTP + DiG-1 1 -dUTP [1:15]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP. 1 ,875 mM dTTP, 0. 125 mM DfG-1 1-dUTP) 
IE buffer (10 mM Tris, 1 mM EOTA. pH 8) 

Maieate buffer: in 700 ml of deionized distilled water, dissolve 11.61 g maieic acid and 8.77 g NaCi. Add NaOH to 
adjust trie pH to 7,5, Bring the volume to 1 L. Stir for 15 mm, and sterilize 

10% blocking solution; in 80 ml deionized distilled water, dissolve 1.16g maieic acid. Next, add NaOH to adjust 
the pHto7.5. Add 10 g of the blocking reagent powder (Boehringer Mannheim, Indianapolis, IN, Cat. no. 1096176). 
Heat to 60 C 'C while stirring to dissolve the powder. Adjust trie volume to 100 ml with water. Stir and sterilize. 

25 1% blocking solution: Dilute the 10% stock to 1% using the maieate buffer. 

Buffer 3 (100 mM Tris, 100 mM NaCl, 50 mM MgCI 2 , pH9.5), Prepared from autoclaved solutions of 1M Tris pH 
9.5. 5 M NaCl. and 1 M MgC! 2 in autoclaved distilled water, 

30 Procedure: 



[23403 

1. PCR reactions are performed in 25 ul volume?, containing: 

35 



PCR buffer 


1X 


MgCl 2 


15 mM 


10X dNTP + DIG-11-dUTP 


IX (please see the note below) 


Platinum Taq™ Polymerase 


1 unit 


10 pg probe DNA 




1 0 pmol primer i 







Use for: 


10X dNTP + DIG-1 1-dUTP (1 55 


< 1 kb 


10X dNTP + DIG-11-dUTP (1:10) 
1 0X dNTP + DIG-11-dUTP (1:15) 


1 kbfo 1.8 kb 
> 1.8 kb 



2. The PCR reaction uses the following amplification cycles 



ss 
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1)Q4*Cfor 10min. 



2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


95 n C - 30 sec 


95«C - 30 sec 


95«C - 30 sec 


61*C- 1 min 


59*C - 1 mm 


src - 1 min 


73*0 - 5 min 


75°C - 5 min 


73° C - 5 min 



5} 72 J C for 8 mm The reaction's are terminated by chilling to 4'C (hold) 



3 The product?, are analyzed by electrophoresis- in a l% agarose gel, comparing to an aliquot of the unlabeled 
probe starting material. 

4. The amount of E)IG- labeled probe is determined as follows: 

Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 rnM Tris and 1 msVf EDTA. pH 8) as 
shown in the following table: 



DiG-labeled control DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution Name) 


5 ng/ril 


1 ul in 49 ul TE 


100 pg/ul (A) 


100 pg/ul (A) 


25 p.! in 25ulTE 


■ . 


50 pg/fti (B) 


25 pi in 25 ul TE 


:.v 


■ '•• 


20 ,ul in 30 ul TE 


10 pg4il (D) 



a. Serial deletion? of a DiG-!abek:d standard DNA tanging from 100 pci to 10 pg are spotted onto a positively 
charged nylon membrane, marking the membrane lightly with a pencil to identify each dilution 

b Serial dilutions (e g . T50. V2500. 1 10.000; of the newly labeled DNA probe are spotted 



c. The membrane is fixed by UV crosslinkrng. 



d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solution 
for 1 5 min at room temp. 

e The labeled DNA is. then detected using alkaline phosphatase conjugated ariti-DIG antibody (Boehtinger 
Mannheim, Indianapolis, IN, cat no. 1093274; and an NBT substrate according to the manufacture's instruc- 
tion. 

f Spot intensities of the control and experimental dilutions are then compared to estimate the concentration 
of the PCR-DlG-labeled probe. 



D. Prehybridization and Hybridization of Southern Blots 

Solutions: 



100% Forma mide 


purchased from Gibec 


20X SSC 


(1X = 0,15 M NaCI, 0,015 M Na 3 citrate) 


per L: 


175g NaCl 




87.5 g Na 3 citrate-2H a 0 



20% Sarkosyl (N-lauroyl-sarcosine) 
20% SDS tsodium dodecy! sulphate) 
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10% Blocking Reagent. In 80 ml deiontzed distilled water, dissolve i 16 g maleic acid. Next, add NaOH to adjust 
the pH to 7.5. Add 10 g of the blocking reagent powder Heat to 60 ; C while stirring to dissolve the powdet. Adjust 
the volume to 100 ml with water. Stir and sterilize. 



Prehybridization Mix: 


Final Concentration 


Components 


Volume (per 100 ml) 


Stock 


50% 


Formamide 


50 m! 


100% 




ssc 


25 m! 


20X 




Sarkosyl 


0.5 ml 


20% 




SDS 


■: ' : 


20% 


2% 


Blocking Reagent 


20 m! 






Water 







[2342] 



1 Place: the blot in a heat-seaiable plastic bag and add an appropriate volume of prehybridization solution (30 ml/ 
100cm 2 ! at room temperature Seal the bag with a heat sealet. avoiding bubbles as much as possible Lay down 
the- bags in a iafge plastic tray tone tray can accommodate at least 4-5 bags) Ensure that the bags are lying flat 
in the tray so that the prehybridization solution is evenly distributed throughout the bag. Incubate the blot for at 
least 2 hours with gentle agitation using a waver shaker 

2 Denature DIG-labeled DNA probe by incubating for 10 min at 98" C using the PGR machine and immediately 
cool it to 4"C, 

3. Add probe to prehybridization solution {25 ng/ml; 30 ml = 750 ng total probe) and mix well but avoid foaming. 
Bubbles may lead to background. 

A Pour off the ptehybridization solution from the hybridization bags and add new prehybridization and probe so- 
lution mixture to the bags containing the membrane. 

5. incubate with gentle agttation for at least 16 hours, 

6. Proceed to medium stringency post-hybridization wash 1 

Three times for 20 mm. each with gentle agitation using 1X SSC, 1% SDS at 6CTC 

All wash solutions must be prewarmed to 60 : C Use about 100 ml of wash solution per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; agitate sufficiently to 
avoid having membranes stick to one another. 

7 After the wash, proceed to immunological detection and CSPD development 

E. Procedure for Immunological Detection with CSPD 



Solutions: 



[2343] 

Buffer 1: Maleic acid buffer {0.1 M maleic acid, 0,15 M NaCi; adjusted to pH 7.5 with NaoH) 
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Washing buffer: Maieic acid buffer with 0.3% (v/v) Tweert 20, 

Blocking ^tock solution 10% blocking reacjenl in buffet 1 [Jt^olve 1 10X ^oneenttattont blocking reacjenl pcw- 

oer iBf'ehtinger Mannheim Indianapolis IN cut no 10961 ~6j by constantly stirnny 
s on a toT heating bioch or heat in a microwave, autoclave and store at 4X 

Buffer 2 

{1X blocking solution): Dilute the stock solution 1:10 in Buffer 1. 

10 Detection buffer: 0.1 M Tris, 0.1 M NaCi, pM 9.5 

Procedure: 
[2344] 

1 . Afterthe post-hvbridization wash the blots are briefly rinsed (1-5 mm. ) in the maleate washing buffer with gentle 
shaking. 

2. Then the membranes are incubated for aQ mm, m Buffer 2 with gentle shaking. 

3. Anti-D!G-AP conjugate (Boehrmger Mannheim. Indianapolis. IN, cat. no. 10932741 at 75 mU/ml (1:10.000) in 
buffet ^ is used for detection 75 ml of solution can be used for 3 oiots 

4. The membrane is incubated for 30 mm. in the antibodv solution with gentle shaking. 

t The membrane are washed two in washing buffer with gentle shaking ^bout '.bQ mis is ui,ed per wash foi 
blots 

6 The blots, ate equilibiated for Z-G mm in 00 ml detection buffer 

7 Dilute CSPD {1 200) in detection buffer (This can oe prepaied uhead of time and stated in the dart- at 4 C) 
The following steps must be done individually- Bags (one for detection and one for exposure; are generally cut 
and readv before doing the following steps. 

35 ft Th> j blot is care fully wnovurf from tht detection bntf^E ;<nd <■ Am^h liquid pmuvno without drying the intimbrant; 

The blot is immediate iy placet in a baft and 1 6 ml of CSPD solution is added The CSPD solution can be spread 
over the membt : me Bubblies ptesent at the edge : ind on the surface of the blot are tyj icall> temovid bv gentle 
rubbing. The membrane is incubated for 5 mm. in CSPD solution. 

40 q E*v.esb liquid is itmowd and th*> m^mLtan* 1 is blotted LtietK yOUA bide up) on Wh^tmw 3 MM pap^r Do tut 

let the membrane dry completely. 

10. Sea! the damp membrane in a hybridization bag and incubate for 10 min at 3? v C to enhance the luminescent 
reaction. 

45 

11 E/posefoi 2 houis> at rr>om temperature to \~tayfilm Multiple e\po e ureb can oe taken Lumineb~t:nceconttnue e 
foi at least '?4 hours and signal intensity increase;:, duuncs the fust hours 

Example 3; Transformation of Carrot Cefis 

[2345] 'Irjtiiktmation of pLtnt tells tan be accomplished by d nu inter of methods, 3* described above SunilaiH 
a numbei of flantgeneia can b^ tegenwateo fiom tissue tufhjie following trunsf^t motion Iransff'imation and tegen- 
eration of carrot cells as descnoed herein is illustrative. 

[2346] Sincjlf* tell suspension cultuies of cairct iDaucus cjiota) cells an* established from tiypocotyls of cultivar 
ss EarK Nantes in B 5 growth me dium i O L Gambuig «t al Plant Pfj. so' 45 2"2 ( 13~0n plus, 2 4-D ;<nd 15 mM CaC! : 
(B s -44 medium) by methods known in the art. The suspension cultures are subcultured by adding 10 ml of the sus- 
pension culture to 40 mi of B s -44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 rpm at 
27 °C in the dark. 
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[2347] Tne suspension culture cells are transformed with exogenous DNA as desciibe:i by Z Chen etai Plant Mot 
r3,o 3o 16C no^S) Bn^fly 4-d^b pos.t-sutxultui<; _n=IIs art- incubated Attn _N=IKail digestion solution <.ont=iining 0 4 
M soibito! 2% dtiselase 5miVi MES i2-[ri-Moipholino] ^thanesuifonic acidt phi 5 0 for 5 houis The digested aills are 
pelleted gently at: SO xg for 5 mi n. and washed twice an W5 solution containing 154mM NaCt, 5 mM KCI. 125 niM CaCk 
s and 5mM glucose, pH 6.0. The protoplasts are suspended m iVIC solution containing 5 mM MES. 20 mM CaCU. 0.5 
M mannitol f H S " and the protophtst oensit\ is. .jdiusted to about 4 ^ 1C 5 protoplasts pel ml 

[2348] ug of plasinici DNA is mrxed with 0 « ml r>f protoplasts, Tht ff suiting suspension it mued *ith 40% 

polyethylene glycol (MW y000. PEG 8000). by gentle inversion a few times at room temperature for 5 to 25 mm. 
Piotopiast culture medium known m the art is added into the P£G-DNA protoplast mature Protoplast? are incubated 
i0 tn the culture medium foi 24 hour to *> days : md ceil extkKts can t-i used toi assay of franstenl e t pit;sstori of tnt 
mtiodijo>Kf ovne AJteEnativsjiy tt msformiid ^ells can be usto to procure tt msgeni^ oaiius which in turn ran h> j used 
to product tians,genic phnts by methods f-nown in the art foi t-vample Nomuia and Kunaminf P>t Pfrys 
988-99! {1985) Idervf'cjtivn and iio'avon or i,ngle Ci h p tnat PioJjce 5o/faf,c Embnos ,n Canot ^sners'on Cul- 
tures, 

[2349] Tne invention being thus aesenbea it > *il! be apparent to one of ordinary shill in tne art that various modifica- 
tions of the materials and methods for practicing the invention can be made. Such modifications are to be considered 
Aithin the scope of the invents n as defined by the following claims. 

[2350] Each of tne references from tne patent and periodical literature cited herein is hereby e^pressK incomotated 
in its entirety by such citation. 



Claims 

1 . An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which encodes an amino 
25 acid sequence exhibiting at least 40% sequence identity to an amino acid sequence encoded by 

(a} a nucleotide sequence described in REF and/or SEO Table 1 or 2 oi a fragment thereof, oi 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 of 2 or a fragment thereof. 

30 2. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
85% sequence identity to 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof, or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

35 

3. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
85% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 
40 f bj a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4. An isolated nucleic acid molecule which is the reverse of the isolated nucleotide sequence according to any one 
of claims 1-3, such that the reverse nucleotide sequence has a sequence order which is the reverse of the sequence 
order of said isolated nucleotide sequence according to any one of claims 1-3 

45 

5. An isolated nucleic acid molecule comprising a nucleic acid capable of hybridizing to a nucleic acid having a 
sequence selected from the group consisting of: 

fa} a nucleotide sequence which is shown in REF and/or SEQ Table i or 2; and 
so f b} a nucleotide sequence which is complementary to a nucleotide sequence shown in REF and/or SEQ Table 

1 or 2; 

under conditions that permit formation of a nucleic acid duplex at a temperature from about AO'C and AB'C below 
the melting temperature of the nucleic acid duplex. 

ss 

6. The nucleic acid molecule according to any one of claims 1-5, wherein said nucleic acid comprises an open reading 
frame. 
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7. The isolated nucleic acid molecule of any one of claims 1-5. wherein said nucleic acid is capable of functioning as 
3 promoter a 3' end termination sequence, an untranslated region (UTRi. or as a regulatory sequence 

8. The isolated nucleic acid molecule of claim 7. wherein said nucleic acid is a promoter and comprises a sequence 
s selected from the- group consisting of a TATA box sequence, a CAAT box sequence, a motif of GCAATCG or any 

transcriptoin-factor binding sequence, and any combination thereof. 

9. The isolated nucleic acid molecule of claim 7, wherein the nucleic acid sequence is a regulatory sequence which 
is capable of promoting seed-specific expression, embryo-specific expression, ovule-specific expression, tapetum- 

10 specific e<pi&ssion or root-specific expression of a sequence ot any combination thereof 

10. A vector construct comprising a nucleic acid molecule according to any one of claims 1-9, wherein said nucleic 
acid molecule is heterologous to any element in said vector construct 

»5 11, A vector construct according to claim 10 comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing transcription and/or translation; and 

(b) a second nucleic acid having the sequence of said isolated nucleic acid molecule according to any one of 
claims 1 -4; 

wherein said first and second nucleic acids are operably linked and wherein said second nucleic, acid is heterolo- 
gous to any element in said vector construct 

12. The vector construct according to claim 11 wherein said first nucleic acid is native to said second nucleic acid. 

13. The vector construct according to claim 11, wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

14. A vector construct according to claim 10 comprising- 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid molecule according to claim 
?: and 

(d) a second nucleic acid: 

35 wherein said first and second nucleic acids are operably linked and wherein said first nucleic acid is heterologous 

to any element m said vector construct. 

15. The vector construct according to claim 14. wherein S3id first nucleic acid is native to said second nucleic acid 

40 18. The vector construct according to claim M, wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

17. A host cell comprising an isolated nucleic acid molecule according to any one of claims 1-4. wherein said nucleic 
acid molecule is flanked by exogenous sequence. 

45 

18. A host cell comprising a vector construct of any one of claims 10-16 



19. An isolated polypeptide comprising an amino acid sequence 



so (a) exhibiting at least 40% sequence identity of an ammo acid sequence encoded by a sequence shown m 

REF and/or Stv.Q Table 1 or 2 or a fragment thereof; and 

(b) capable of exhibiting at least one of the biological activities of the polypeptide encoded by said nucleotide 
seqence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

55 20. The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 75% sequence identity 
to an ammo acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof 

21. The isolated polypeptide of claim 19. wherein said amino acid sequence exhibits at least 85% sequence identity 
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to an amino acid seouence encoded b) a sequence snown in SEO Table i or 2 or a ttagment tnereof 

22. Tht; isolaled polypeptide of daim 1*j wherein said ammo acid sequence f^hibits al leasi *j0% sequ^nc^ identity 
to an ammo acio sequence en:oded oy a sequent e sshown in SEQ Table 1 oi 2 or a fragment thete^f 

s 

23. An antibody c.jpable of binding the isolated polv peptide of .jew one ot claims V>-'s?. 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

10 f =it providing an isolated nuclei acid moleaik : iocordinq 1g any one ot claims 1-4 : md 

(t't contacting sjid isolated nucleic v\tth s iid ho^t cdi undur conditions that permit insertion of said nucleic 
acid into said host cell. 

25. A methoo of transfoimtng <j rnsst cell which compns^s contacting a host ;e!l with a vectnt ^n^tiuct according to 
»5 anyone of claims 10-16. 

26 A method ot modulating tr.jnionption and,oi translation of a nuclei*, acid in a host >.ell computing 

(a) providing the host ceil ot claim 24 or 25: and 
so (b) cultunng said host cell under conditions that permit transcription or translation. 

27. A method tor detecting a nucleic acid in a sample which comptises 

(at providing an isolated nucleic jcid moleculr-- according to any un« of claims 1-5 
25 (b) contacting said isolated nucleic acid molecule with a sample under conditions which permit a comparison 

of the sequence ot said isolated nucleic acid molecule with the sequence of DNA in said sample: and 
(c) analyzing the result of said comparison, 

28. The method according to claim 2~ therein said isolated nucleic acid molecule and said sample ate contacted 
30 under conditions which permit the formation ot a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which is 
exogenous to said plant or plant cell. 

35 30 A plant of of a plant which comprises 3 nucleic if id molecule iccording to am, ont of claims 1-4 * riorum said 
nucleic acid molecule is heterologous to said plant or said cell of a plant 

31 -i plant or cell ot 3 plant which has be<-n trans, f^rmeo with a nucleic 3cio molecule ac< oioing to any on<- of claims 1-4 

40 32 ^ plant ot cfll cf a plant v,hich compnsfs a vectoi conduct according to any on* 1 of claims 10-1r 

33. A pluntot ;e!l of a f I ant which has been transf^imed with a vector construct according to any ^ne of claims 10-16 

34 A f !.*nt which h^s been tegeneiated from a plant cell scolding tc .tnv c ne of cl.jun? 29-'<3 



ss 
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